http://analog.cx/docs/cache.html is what you want.
However, when your data volume gets to this size you should also start considering what your real needs are and what's important to analyze. Sure it might be *nice* to have cummulative reports for all pages, but really, you are probably interested in the change of those stats each month or quarter. Doing iterative reports (and archiving them) allows you to compare trends over time.
Also, as you delve more into the reports, you will start asking specific questions that require investigative research (has the number of FireFox users vs IE users significantly affected our site? How?). These can best be done with *INCLUDE commands to filter data. These need to be run on the original log files (cache files don't contain the bindings necessary to, say, look at all requests by FireFox users). So you'll probably want to keep the around. Although they do compress well.
Hope that helps.
-- Jeremy Wadsack Seven Simple Machines
Brian Szymanski wrote:
Hi.
My organization wants cumulative reports (ie every log we have since launch) updated every week. This was fine for the first few months, just brute forcing it, but we're going on our 1 year birthday now and it's taking a good hour of cpu time and lots of io hits on a 3GHzish p4 to churn through all those logs every night.
We're a moderate traffic website I'd say, almost top 10,000 according to alexa but not quite: Average successful requests per day: 114,873 Average successful requests for pages per day: 84,555 Average data transferred per day: 14.765 gigabytes But by no means one of the biggest out there - there are clearly sites that get 10-100 times the traffic we get. Like most, our organization hopes to expand, and get more traffic...
This raises the question: Churning through every log since the birth of the website is a lot of work. Is there/should there be a way to get analog to dump its state into some file (in a way that it is faster to parse than a logfile, but has all the same info). Assuming this file could be read relatively quickly, just have: (time to read state file))+(time to process new log file) instead of (roughly) numdays*(time to process new log file)
Any ideas?
Thanks in advance, Brian
Brian Szymanski Software and Systems Developer Media Matters for America [EMAIL PROTECTED]
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

