Robert wrote:
>
> Hi list,
> I am writing a script to parse web logs, this script captures IP, DATE
> and TIME and
> Requested page in $host, $date, $request variables. How do I capture
> "http://www.portallink.com/finder.jhtml
> <javascript:openExternal('http://www.portallink.com/finder.jhtml')>" in
> another variable $url in the same way.
>
> Script:
> #!/usr/bin/perl -w
>
> $logfile = shift || &usage;
>
> # forward declarations
> my ($host, $date, $request);
>
> # open the log
> open(LOG, "<$logfile") or die "could not open $logfile\n";
>
> while(<LOG>) {
> ($host, $date, $request) =
> $_ =~ m{
> (.*?)\s # host name or IP
> \[(.*?)\]\s # date
> \"(.*?)\"\s # request method
> \"http\:\/\/(.*?)\"\s # url <-------------(This should
> matchhttp://www.portallink.com/....jhtml right but it is giving errors?
> }x;
>
> print "$host $date $request \n";
> }
Try more like:
while (<LOG>) {
my ($host, $date, $request, $url) = $_ =~ m{
(.*?)\s+ # host name or IP
[^[]+ # garbage
\[(.*?)\]\s+ # date
\"(.*?)\"\s+ # request method, page and version
.*? # more garbage
\"(?:(-)|http\:\/\/(.*?))\"\s+ # url can be "-"
}x;
print "$host\n";
print "\t$date\n";
print "\t$request\n";
print "\t$url\n";
}
I would normally use split on a log file. The spaces embedded in "'s
complicate it a bit, but it's still doable.
> Sample web log
> 216.177.64.101 - - [09/Apr/2003:00:00:41 -0400] "GET /logout.jhtml
> HTTP/1.0" 200 15758 "http://www.portallink.com/finder.jhtml
> <javascript:openExternal('http://www.portallink.com/finder.jhtml')>"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
> "ProfileCookie=demositelite,XYZ Company, Inc.,;
> JSESSIONID=OCZA0EYAAACSYCQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
> 216.177.64.101 - - [09/Apr/2003:00:00:44 -0400] "GET
> /index.jhtml;jsessionid=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;jsessionid=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G?LOGOUT=yes&DPSLogout=true&_requestid=42166
> HTTP/1.0" 200 8194 "http://www.portallink.com/logout.jhtml
> <javascript:openExternal('http://www.portallink.com/logout.jhtml')>"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
> "ProfileCookie=demosite,XYZ Company, Inc.,;
> JSESSIONID=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
> 216.177.64.101 - - [09/Apr/2003:00:00:45 -0400] "GET /index.jhtml
> HTTP/1.0" 200 19243 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
> 5.0)" "ProfileCookie=demosite,XYZ Company, Inc.,;
> JSESSIONID=OC0BPEIAAACS2CQFAKLSFEQKAUBJQI5G;
> anonauth=XYZuser%40XUMA123g;
> auth=demosite%40a3f7edcd509ca46e516f0249cbdc5f53;
> XYZdomain=www.portallink.com"
>
> Errors
> ./parse.pl sample
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 1.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.
>
> Use of uninitialized value in concatenation (.) or string at t1.pl line
> 20, <LOG> line 2.
--
,-/- __ _ _ $Bill Luebkert Mailto:[EMAIL PROTECTED]
(_/ / ) // // DBE Collectibles Mailto:[EMAIL PROTECTED]
/ ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_</_</_ http://dbecoll.tripod.com/ (Free site for Perl/Lakers)
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs