On Thu, Jun 08, 2006 at 01:49:16PM -0700, Michael O'Keefe wrote: > John Oliver wrote: > >So I wrote a bash script that goes through Apache log files, looks for > >certain elements, counts how many times each element appears, and prints > >out the result. It took 4 hours to run against ~470MB of logs, which > >were accumulated in seven days with not much going on, compared to what > >we expect to see in the coming months. > > I suppose it really depends on what you're counting, and how you're > counting it. > > I only have about 30MiB of logs to run my scans through to find (for > example) the most common referrers, google search terms, unique IP > connects etc... > But it only takes a few seconds to get me the results. > But your logs are more than 15 times larger, and I don't know what you > are looking for
I start by grepping for lines that include "project.xml", and then grep -v lines that include a couple of other strings of characters. Everything that's left goes through a couple of cuts to get the field I want. That output is sorted and run through uniq to find out how many different elements there are, and then I use a loop with the results of uniq to go back through the sorted list to count how many times each element appears. FWIW, there are about 2.2 million lines in my sample. -- *********************************************************************** * John Oliver http://www.john-oliver.net/ * * * *********************************************************************** -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
