http://analog.cx/docs/cache.html is what you want.

However, when your data volume gets to this size you should also start considering what your real needs are and what's important to analyze. Sure it might be *nice* to have cummulative reports for all pages, but really, you are probably interested in the change of those stats each month or quarter. Doing iterative reports (and archiving them) allows you to compare trends over time.

Also, as you delve more into the reports, you will start asking specific questions that require investigative research (has the number of FireFox users vs IE users significantly affected our site? How?). These can best be done with *INCLUDE commands to filter data. These need to be run on the original log files (cache files don't contain the bindings necessary to, say, look at all requests by FireFox users). So you'll probably want to keep the around. Although they do compress well.

Hope that helps.

--
Jeremy Wadsack
Seven Simple Machines


Brian Szymanski wrote:

Hi.

My organization wants cumulative reports (ie every log we have since
launch) updated every week. This was fine for the first few months, just
brute forcing it, but we're going on our 1 year birthday now and it's
taking a good hour of cpu time and lots of io hits on a 3GHzish p4 to
churn through all those logs every night.

We're a moderate traffic website I'd say, almost top 10,000 according to
alexa but not quite:
Average successful requests per day: 114,873
Average successful requests for pages per day: 84,555
Average data transferred per day: 14.765 gigabytes
But by no means one of the biggest out there - there are clearly sites
that get 10-100 times the traffic we get. Like most, our organization
hopes to expand, and get more traffic...

This raises the question: Churning through every log since the birth of
the website is a lot of work. Is there/should there be a way to get analog
to dump its state into some file (in a way that it is faster to parse than
a logfile, but has all the same info). Assuming this file could be read
relatively quickly, just have:
 (time to read state file))+(time to process new log file)
instead of (roughly)
 numdays*(time to process new log file)

Any ideas?

Thanks in advance,
Brian

Brian Szymanski
Software and Systems Developer
Media Matters for America
[EMAIL PROTECTED]




+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

Reply via email to