So I wrote a bash script that goes through Apache log files, looks for
certain elements, counts how many times each element appears, and prints
out the result.  It took 4 hours to run against ~470MB of logs, which
were accumulated in seven days with not much going on, compared to what
we expect to see in the coming months.

I'll probably wind up having to rotate these logs daily, but I could
easily see 500MB+ worth of logs per day.  So, I need to get my counting
done a lot faster.  I'm starting to teach myself some perl, but will
that be fast enough?  Is the answer going to be a compiled C
application?  Or is there another tool that might be more appropriate
for this task?

There is no hard and fast "this script must be able to run in this
amount of time" requirement.  Just "as fast as possible", mainly because
I only expect its' job to get harder and harder until we break down and
use something like PHPOpenTracker to write our own WebSideStory-type of
app.

I'm also wondering how much difference hardware could make.  The bash
script ran on a single P4 2.4GHz CPU with 1GB RAM.  I'm not sure where
the bottleneck is with processing lots and lots and lots of text :-)

-- 
***********************************************************************
* John Oliver                             http://www.john-oliver.net/ *
*                                                                     *
***********************************************************************


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to