Kevin Zembower wrote: > > I'm trying to do a quick-n-dirty (well, I've been at work on it three > hours now) analysis of Apache web logs. I'm trying to count the number > of records from robots or spiders. For my purposes, a robot or spider is > a request from either an unresolved IP address, or one that has "bot", > "spider", "crawl" or "search" in it's resolved domain name. I don't > count at all requests that come from my LAN (172.16.0.0/16) or domain > (jhuccp.org). My program so far is this: > #!/usr/local/bin/perl -w > my ($robotcount, $totalcount) = 0; > while (<>) { > next if /^172\.16/; > next if /^.*?jhuccp\.org +?/; > $totalcount++; > if > (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|.*(bot|crawl|spider|search).*?) > ..*$/) { > print; > $robotcount++; > } > } > print "Robot count is $robotcount\tTotal count is $totalcount\t Ratio > is " . $robotcount/$totalcount . "\n"; > > This correctly picks up the numerical IP addresses, but also matches > records like this: > dup-200-66-146-45.prodigy.net.mx - - [30/Jun/2002:00:03:50 -0400] "GET > /prs/sj41/sj41chap1_3.stm HTTP/1.1" 200 9379 > >"http://search.t1msn.com.mx/results.asp?q=relaci%C3%B3n+sexual&origq=yahoo&FORM=IE4&v=1&cfg=SMCSP&nosp=0&thr=&submitbutton.x=39&submitbutton.y=12" > "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" > > Here, the word "search" is in the referrer field. > > How do I tell it to search only up to the first space character? I > think I can do it by defining a second variable that is just the part of > the record up to the first space, and matching on that. But, is there a > another way, probably using the 'minimizing' quantifiers?
if (/^(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\S*(?:bot|crawl|spider|search)\S*)\s/) { John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]