Kevin Zembower wrote:
> 
> I'm trying to do a quick-n-dirty (well, I've been at work on it three
> hours now) analysis of Apache web logs. I'm trying to count the number
> of records from robots or spiders. For my purposes, a robot or spider is
> a request from either an unresolved IP address, or one that has "bot",
> "spider", "crawl" or "search" in it's resolved domain name. I don't
> count at all requests that come from my LAN (172.16.0.0/16) or domain
> (jhuccp.org). My program so far is this:
> #!/usr/local/bin/perl -w
> my ($robotcount, $totalcount) = 0;
> while (<>) {
>    next if /^172\.16/;
>    next if /^.*?jhuccp\.org +?/;
>    $totalcount++;
>    if
> (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|.*(bot|crawl|spider|search).*?)
> ..*$/) {
>       print;
>       $robotcount++;
>       }
>    }
> print "Robot count is $robotcount\tTotal count is $totalcount\t Ratio
> is " . $robotcount/$totalcount . "\n";
> 
> This correctly picks up the numerical IP addresses, but also matches
> records like this:
> dup-200-66-146-45.prodigy.net.mx - - [30/Jun/2002:00:03:50 -0400] "GET
> /prs/sj41/sj41chap1_3.stm HTTP/1.1" 200 9379
> 
>"http://search.t1msn.com.mx/results.asp?q=relaci%C3%B3n+sexual&origq=yahoo&FORM=IE4&v=1&cfg=SMCSP&nosp=0&thr=&submitbutton.x=39&submitbutton.y=12";
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
> 
> Here, the word "search" is in the referrer field.
> 
> How do I tell it to search only up to the first space character? I
> think I can do it by defining a second variable that is just the part of
> the record up to the first space, and matching on that. But, is there a
> another way, probably using the 'minimizing' quantifiers?


if
(/^(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\S*(?:bot|crawl|spider|search)\S*)\s/)
{


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to