So I wrote a bash script that goes through Apache log files, looks for certain elements, counts how many times each element appears, and prints out the result. It took 4 hours to run against ~470MB of logs, which were accumulated in seven days with not much going on, compared to what we expect to see in the coming months.
I'll probably wind up having to rotate these logs daily, but I could easily see 500MB+ worth of logs per day. So, I need to get my counting done a lot faster. I'm starting to teach myself some perl, but will that be fast enough? Is the answer going to be a compiled C application? Or is there another tool that might be more appropriate for this task? There is no hard and fast "this script must be able to run in this amount of time" requirement. Just "as fast as possible", mainly because I only expect its' job to get harder and harder until we break down and use something like PHPOpenTracker to write our own WebSideStory-type of app. I'm also wondering how much difference hardware could make. The bash script ran on a single P4 2.4GHz CPU with 1GB RAM. I'm not sure where the bottleneck is with processing lots and lots and lots of text :-) -- *********************************************************************** * John Oliver http://www.john-oliver.net/ * * * *********************************************************************** -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
