Re: Regular Expressions http error code

Rob Dixon Fri, 14 Mar 2003 03:36:42 -0800

Hi Derek.

Derek Romeyn wrote:
> Using your idea I ended up with data like this.  Which is odd because
> the database should only include 400 and 500 type errors.
>
[snip]
>
> 404     24.54.175.153 - - [11/Mar/2003:07:48:37 -0800] "GET 
> /e/t/invest/img/spacer.gif HTTP/1.1" 404 0 "https://
> 370     209.91.198.57 - - [11/Mar/2003:07:48:24 -0800] "GET 
> /e/t/search/aaa?qmenu=2&sym=dyn, intc HTTP/1.0" 400 370
> 526     66.196.65.24 - - [11/Mar/2003:07:54:32 -0800] "GET 
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/5.0 (Slur
> 178     167.127.163.141 - isklvjyy [11/Mar/2003:08:02:46 -0800] "GET /e/t/aaa 
> HTTP/1.1" 500 178 "-" "Mozilla/4.0 (compatible
> 404     68.39.167.38 - - [11/Mar/2003:08:06:34 -0800] "GET /e/t/aaa/img/spacer.gif 
> HTTP/1.1" 404 0 "https://us.etrade.com/e/
> 526     65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET 
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en
> 526     65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET 
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en
>
> The 404's were right but the rest took the second group of numbers
> instead of the needed first.
>
> This is how my code looked:
>
> my $code,$msg;
> foreach (@RAW_DATA) {
>         $code = $1 if m|HTTP.*\s+(\d{3})|g;


Here's your problem. You're searching for 'HTTP', followed
by any number of any character, followed by one or more
whitespace characters and three digits. Because the '.*' will
eat up as much as it can, the captured digits will be the
/last/ occurrence of three digits following a space. If you
change '.*' into '.*?' it will match as few characters as possible
and you'll get the three digits you want.

Also, do you need the /g modifier on this search? I don't
think it can make any difference in this context. I'd
recommend using /x though so that you can lay it out
a little more visibly.

>         ($timestamp, $msg) = split(/\t/);

I'm not clear from your data which fields you're extracting,
but I assume this split works as you haven't said otherwise.

>         if (!$code) {
>                 print "NEXT\n";
>                 next;
>         }

Surely you really want to 'next' if the initial match fails?

>         print "$code\t$msg\n";
>         $code = 0;
> }
>
> I did manage to get a version of George's to work.  Still interested
> in trying all variations though.

The following corrects all my points above. Use it if you like it.

HTH,

Rob



    foreach (@RAW_DATA) {

        unless ( m|  HTTP.*?  \s+  (\d{3})  |x ) {
            print "NEXT\n";
            next;
        }

        my $code = $1;
        my ($timestamp, $msg) = split(/\t/);

        print "$code\t$msg\n";
    }




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Regular Expressions http error code

Reply via email to