Robert wrote:
> 
> Hi list,
> I am writing a script to parse web logs, this script captures IP, DATE
> and TIME and
> Requested page in $host, $date, $request variables. How do I capture
> "http://www.portallink.com/finder.jhtml
> <javascript:openExternal('http://www.portallink.com/finder.jhtml')>" in
> another variable $url in the same way.
> 
> Script:
> #!/usr/bin/perl -w
> 
> $logfile = shift || &usage;
> 
> # forward declarations
> my ($host, $date, $request);
> 
> # open the log
> open(LOG, "<$logfile") or die "could not open $logfile\n";
> 
> while(<LOG>) {
> ($host, $date, $request) =
> $_ =~ m{
> (.*?)\s # host name or IP
> \[(.*?)\]\s # date
> \"(.*?)\"\s # request method
> \"http\:\/\/(.*?)\"\s # url <-------------(This should
> matchhttp://www.portallink.com/....jhtml right but it is giving errors?
> }x;
> 
> print "$host $date $request \n";
> }

Try more like:

while (<LOG>) {
        my ($host, $date, $request, $url) = $_ =~ m{
            (.*?)\s+                            # host name or IP
            [^[]+                               # garbage
            \[(.*?)\]\s+                        # date
            \"(.*?)\"\s+                        # request method, page and version
            .*?                                 # more garbage
            \"(?:(-)|http\:\/\/(.*?))\"\s+      # url can be "-"
          }x;
        print "$host\n";
        print "\t$date\n";
        print "\t$request\n";
        print "\t$url\n";
}

I would normally use split on a log file.  The spaces embedded in "'s
complicate it a bit, but it's still doable.

> Sample web log
> 216.177.64.101 - - [09/Apr/2003:00:00:41 -0400] "GET /logout.jhtml
> HTTP/1.0" 200 15758 "http://www.portallink.com/finder.jhtml
> <javascript:openExternal('http://www.portallink.com/finder.jhtml')>"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
> "ProfileCookie=demositelite,XYZ Company, Inc.,;
> JSESSIONID=OCZA0EYAAACSYCQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
> 216.177.64.101 - - [09/Apr/2003:00:00:44 -0400] "GET
> /index.jhtml;jsessionid=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;jsessionid=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G?LOGOUT=yes&DPSLogout=true&_requestid=42166
> HTTP/1.0" 200 8194 "http://www.portallink.com/logout.jhtml
> <javascript:openExternal('http://www.portallink.com/logout.jhtml')>"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
> "ProfileCookie=demosite,XYZ Company, Inc.,;
> JSESSIONID=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
> 216.177.64.101 - - [09/Apr/2003:00:00:45 -0400] "GET /index.jhtml
> HTTP/1.0" 200 19243 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
> 5.0)" "ProfileCookie=demosite,XYZ Company, Inc.,;
> JSESSIONID=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
> 
> Errors
> ./parse.pl sample
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.
> 
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.


-- 
  ,-/-  __      _  _         $Bill Luebkert    Mailto:[EMAIL PROTECTED]
 (_/   /  )    // //       DBE Collectibles    Mailto:[EMAIL PROTECTED]
  / ) /--<  o // //      Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_</_</_    http://dbecoll.tripod.com/ (Free site for Perl/Lakers)


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to