I'm trying to analyze web logs records which look like this: 2004-03-28 00:38:31 d7.facsmf.utexas.edu - W3SVC1 DB db.jhuccp.org GET /dbtw-wpd/exec/dbtwpcgi.exe XC=%2Fdbtw-wpd%2Fexec%2Fdbtwpcgi.exe&BU=http%3A%2F%2Fdb.jhuccp.org%2Fpopinform%2Fbasic.html&QB0=AND&QF0=Abstract+%7C+KeywordsMajor+%7C+KeywordsMinor+%7C+Notes+%7C+EngTitle+%7C+TT+%7C+FREAb+%7C+SPAAb&QI0=China%0D%0A&QB1=AND&QF1=Author+%7C+CN&QI1=&MR=10&TN=popline&RF=ShortRecordDisplay&DF=LongRecordDisplay&DL=1&RL=1&NP=0&AC=QBE_QUERY&x=37&y=4 200 0 21248 814 19391 80 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705) - http://db.jhuccp.org/popinform/basic.html
In this record, in the tenth space-delimited field, which starts "XC=%2Fdbtw" there are variables which start with "QF" followed by a number, for instance "QF0=Abstract+%7C+KeywordsMajor+%7C+KeywordsMinor+%7C+Notes+%7C+EngTitle+%7C+TT+%7C+FREAb+%7C+SPAAb&" This indicates that the fields to be searched in the database are "Abstract KeywordsMajor KeywordsMinor..." The same numbered "QI" variable, in this case "QI0=China%0D%0A" indicates searching for "China" in these fields. For every "QF" record, there should be a corresponding "QI" record with the same number, although the value might be blank, as in "QF1=Author+%7C+CN&QI1=&". This section of the above example indicates that a search should be performed in the "Author" and "CN" fields, but the value for "QI1" is blank, so it matches everything. My program, which I've pasted in below my signature, tries to find a "QF" value, matches it to a list of fieldnames ("If the list of fields to be searched contains the 'Abstract' field, it should be considered a 'subject' search") then grabs the corresponding "QI" value, to print it out. However, I can never match anything beyond the digit. In my program below, the line: print "Match successful!\n" if ($query =~ /QI$1/); works, but the next three lines: $query =~ /QI$1=(.*?)&/; $subject = $1; print "Subject: $1\n" if ($debug); never matches anything. I've been working on this, on and off, the last two days. Any suggestions or pointing out my boneheaded errors is gratefully appreciated. Any other overall suggestions on my coding are welcomed. This script seems to run very slowly, due probably to all the complex regex. Thanks for all your help and suggestions. -Kevin Zembower centernet:/opt/analog/logdata/db # cat listqueries3.pl #!/usr/local/bin/perl $debug = 1; while (<>) { next unless (/TN=popline/i); #Just analyze the records for the POPLINE database $subject = $author = $docno = $title = $journalsource = $keywords = $languages = $popreporttopic = $refereed = $year = ""; if (/^.* .* .* .* .* .* .* GET [^ ]*dbtwpcgi\.exe .*QI0=[^&]*&.*QI1=[^&]*&.*/){ if (/QI2/) { $type = "A"; } else { $type = "B"; } ($date, $time, $source, $junk, $junk, $host, $FQDN, $method, $file, $query, $junk) = split; while ($query =~ m/QF(\d+)=(\S*?)&/ig) { print "fieldnumber = :$1:\tfieldname = $2\n" if ($debug); if ($2 =~ /abstract/i) { print "Abstract found!\n" if ($debug); print "Query: $query\n" if ($debug); print "Match successful!\n" if ($query =~ /QI$1/); $query =~ /QI$1=(.*?)&/; $subject = $1; print "Subject: $1\n" if ($debug); } elsif ($2 =~ /author/i) { $query =~ /QI$1=(\S*?)&/; $author = $1; } elsif ($2 =~ /endtitle/i) { $query =~ /QI$1=(\S*?)&/; $title = $1; } } #while there are more matches for QFn fields $outstring = "$type\t$date\t$time\t$subject\t$author\t$title\t$journalsource\t$keywords\t$languages\t$popreporttopic\t$refereed\t$year\n"; print translate($outstring); }# if it's a request for a database query }# while there are more lines in the input file sub translate() { $_ = $_[0]; s/%22/\"/g; s/%2C/,/g; s/%20/ /g; s#%2F#/#g; s/%3D/=/g; s/%3B/;/g; s/%26/&/g; s/%0D//g; s/%0A//g; s/\+/ /g; s/%29/)/g; s/%28/(/g; s/%27/\' /g; s/%2b/+/g; s/%7C/|/g; s/%3A/:/g; #Debbie request all boolean logical words and sumbols be replaced with '|' s/\b(and)\b/|/ig; s/\b(or)\b/|/ig; s/&/|/g; s[/][|]g; $_; } centernet:/opt/analog/logdata/db # cat v 2004-03-28 00:38:31 d7.facsmf.utexas.edu - W3SVC1 DB db.jhuccp.org GET /dbtw-wpd/exec/dbtwpcgi.exe XC=%2Fdbtw-wpd%2Fexec%2Fdbtwpcgi.exe&BU=http%3A%2F%2Fdb.jhuccp.org%2Fpopinform%2Fbasic.html&QB0=AND&QF0=Abstract+%7C+KeywordsMajor+%7C+KeywordsMinor+%7C+Notes+%7C+EngTitle+%7C+TT+%7C+FREAb+%7C+SPAAb&QI0=China%0D%0A&QB1=AND&QF1=Author+%7C+CN&QI1=&MR=10&TN=popline&RF=ShortRecordDisplay&DF=LongRecordDisplay&DL=1&RL=1&NP=0&AC=QBE_QUERY&x=37&y=4 200 0 21248 814 19391 80 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+.NET+CLR+1.0.3705) - http://db.jhuccp.org/popinform/basic.html centernet:/opt/analog/logdata/db # ./listqueries3.pl v fieldnumber = :0: fieldname = Abstract+%7C+KeywordsMajor+%7C+KeywordsMinor+%7C+Notes+%7C+EngTitle+%7C+TT+%7C+FREAb+%7C+SPAAb Abstract found! Query: XC=%2Fdbtw-wpd%2Fexec%2Fdbtwpcgi.exe&BU=http%3A%2F%2Fdb.jhuccp.org%2Fpopinform%2Fbasic.html&QB0=AND&QF0=Abstract+%7C+KeywordsMajor+%7C+KeywordsMinor+%7C+Notes+%7C+EngTitle+%7C+TT+%7C+FREAb+%7C+SPAAb&QI0=China%0D%0A&QB1=AND&QF1=Author+%7C+CN&QI1=&MR=10&TN=popline&RF=ShortRecordDisplay&DF=LongRecordDisplay&DL=1&RL=1&NP=0&AC=QBE_QUERY&x=37&y=4 Match successful! Subject: fieldnumber = :1: fieldname = Author+%7C+CN B 2004-03-28 00:38:31 centernet:/opt/analog/logdata/db # ----- E. Kevin Zembower Unix Administrator Johns Hopkins University/Center for Communications Programs 111 Market Place, Suite 310 Baltimore, MD 21202 410-659-6139 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>