Mike Blomgren wrote:

> Hi,
> 
> I'm trying to write a patternmatching regexp, with two optional
> parenthesis, but I can't figure out how to have an 'optional' match.
> I.e. I want a match, regardless if the last two fields are available or
> not. But if thy are available, I want to use them... I'm confident there
> is a simple solution - I just haven't found it yet...
> 
> In practice:
> The logfiles are from several Apache webservers. Some files contain two
> additonal fields containing Referer and Browser type, which are last on
> each line (example below, may be wrapped).
> 
> 10.0.0.1 - - [30/Aug/2001:14:58:16 +0200] "GET /banner_1.gif HTTP/1.1"
> 200 12796 
> "http://example.com/"; "Mozilla/5.0 (Windows; U; Win98; en-US; m18)
> Gecko/20010131 Netscape6/6.01" 
> 
> My Patternmatching code look s as follows:
> 
>     if ( m/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})   # IP Address
>          \x20(.+?)                        # User
>          \x20(.+?)                        # unused
>          \x20(\[.+\])                     # Date
>          \x20\"(.*?\n*?.*?)               # Request
>          (HTTP\/.*?|)\"                   # Match regardless of HTTP
> Version.
>          \x20(\d+?)                       # Statuscodes
>          \x20([\-\d]+?)                   # Size
>          \x20(\".*?\")                    # Optional Referer
>          \x20(\".*?\")                    # Optinal Browser type
>          /ox )
> 
> However, it's the last two fields ($9 and $10) that I want to be
> optional. If they don't exist in the current line being matched, I still
> want the rest of the fields to be populated ($1 - $8). I.e. an
> 'optional' match...
> 
> On alternativ is to have two different pattern matching statements, but
> that would complicate matters. There are more 'optionals' than just
> these examples...
> 
> Any help would be greatly apprecieted. And yes, I have read the docs,
> but simply not understood them.


The simpler way to do this is to do a split on whitespace and then it's 

a simple matter to determine the fields in the resultant array.  Everything
after field 11 can be rejoined as the browser.

Basically:

my ($ip, $f1, $f2, $dt, $tz, $meth, $page, $proto, $status, $len, $f10,
   $browser, @rest) = split /\s+/, $_;

# <do stuff here>

# recombine browser info

$browser .= ' ';
$browser .= join ' ', @rest;

-- 
   ,-/-  __      _  _         $Bill Luebkert   ICQ=14439852
  (_/   /  )    // //       DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--<  o // //      http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/_<_</_</_     Castle of Medieval Myth & Magic http://www.todbe.com/

_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to