On Jun 8, 2006, at 6:22 PM, Andrew Lentvorski wrote:

John Oliver wrote:
I start by grepping for lines that include "project.xml", and then grep
-v lines that include a couple of other strings of characters.
Everything that's left goes through a couple of cuts to get the field I want. That output is sorted and run through uniq to find out how many different elements there are, and then I use a loop with the results of
uniq to go back through the sorted list to count how many times each
element appears.

This is definitely a job for a regex engine inside a programming language. Perl will be the most succinct. Python will be readable in 6 months. Your call. ;)

You can anchor on the project.xml and chop the appropriate pieces out into variables. Use hashes for uniqueness counting.


This snippet will grab the seventh field of every line containing "project.xml" and stuff it into an array.

-----
open(LOGFILE, "< /path/to/logfile")
  or die "Can't open the logfile: $!\n";

my (@array);
while (my $line <LOGFILE>) {

  # we don't care about lines that don't
  # have "project.xml" in them
  next unless ($line =~ m/project.xml/);

  # We want the seventh field, because it's
  # more interesting than the first or last
  # Fields are space-separated.
  # split() the line on spaces, which returns
  # an array, then grab the 7th element and
  # stuff it in $target.  Arrays are 0-indexed.
  my $target = (split(/ /, $line))[6];

  push (@array, $target);

}
-----

without knowing the details of the data, I'm not sure how I'd stuff them into a hash to do the rest.

I can honestly say, though, that perl is designed for this kind of stuff. I rewrote a whole suite of shell scripts that were used to generate our aliases files for sendmail here at the CSE dept. with a perl script. Original scripts took 3+ minutes to generate aliases files on a Blade 500. Perl script took 0.015 seconds to do the exact same thing. Core support for regular expressions, arrays/lists, hashes and friends are _nice_.

Gregory

--
Gregory K. Ruiz-Ade <[EMAIL PROTECTED]>
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu


Attachment: PGP.sig
Description: This is a digitally signed message part

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to