Hi, Further to my posts 48450 & 48452, I have modified the perl script. Here it is: ----------------------
#!/usr/bin/perl ###use strict; use Getopt::Long; use Time::Local; my $file="access_log_modified"; my $begin_time = ""; my $end_time; my @visual_pages = (); my @final_visual_pages = (); my %increment = (); my ($datetime, $get_post, $Day, $Month, $Year, $Hour, $Minute, $Second); my $interval = 60; #An interval of 1 minute count_recs(); sub count_recs { open (INFILE, "<$file") || die "Cannot read from $file"; WHILELOOP: while (<INFILE>) { ($datetime,$get_post) = (split / /) [3,6]; $datetime =~ s/\[//; ($Day,$Month,$Year,$Hour,$Minute,$Second)= $datetime =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#; next WHILELOOP if ($get_post =~ /\.js$/ || $get_post =~ /\.gif$/ || $get_post =~ /\.css$/); unless ($begin_time) { $begin_time = $datetime; } push (@dates, $datetime); } #outer while ###The code below is faster compared to the grep one foreach $dateproc (@dates) { $increment{$dateproc}++; } foreach $dateproc (sort keys %increment) { push (@{$processed_visual_pages{$dateproc}}, $increment{$dateproc}); ###This prints the date timestamp and the number of occurrences of this ###record. print "$dateproc @{$processed_visual_pages{$dateproc}}\n"; } close(INFILE); } ----------------- The output I get is: The first column is the timestamps and the second column is the no. of occurrences of that record. 25/Apr/2003:13:54:02 3 25/Apr/2003:13:54:19 2 25/Apr/2003:13:54:22 4 25/Apr/2003:13:54:34 3 25/Apr/2003:13:54:38 5 25/Apr/2003:13:54:41 3 25/Apr/2003:13:54:43 6 25/Apr/2003:13:54:44 3 25/Apr/2003:13:54:46 5 25/Apr/2003:13:54:47 2 25/Apr/2003:13:54:48 3 25/Apr/2003:13:54:50 7 25/Apr/2003:13:54:51 4 25/Apr/2003:13:54:53 2 25/Apr/2003:13:54:58 3 25/Apr/2003:13:55:01 2 25/Apr/2003:13:55:02 4 25/Apr/2003:13:55:05 5 25/Apr/2003:13:55:08 1 25/Apr/2003:13:55:14 3 25/Apr/2003:13:55:15 1 25/Apr/2003:13:56:13 6 25/Apr/2003:13:56:27 5 25/Apr/2003:13:56:35 4 25/Apr/2003:13:56:40 4 25/Apr/2003:13:56:45 1 25/Apr/2003:13:56:51 5 --------------------------------- I shall appreciate any ideas to calculate the average, min and max number for the above starting with the first timestamp and for a range of say, 60 seconds. The output would look like: (Since the 1 min interval starting from 13:54:02 is 13:55:02) Time Average Min Max 25/Apr/2003:13:55:02 4.5 2 7 25/Apr/2003:13:56:13 3.5 1 6 This repeats for the next interval A snippet of the input: 127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 34906 127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 34906 -------------- TIA Anand Ramprasad <[EMAIL PROTECTED]> wrote: Anand Babu wrote: > Hi all, > > I am new to this group. I need help regarding a perl script which > parses the web log file, access_log. > Welcome , This is the most friendly list I have seen > The format of the access_log is: > > 127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 > 34906 > > The goal is to What do You expect ? Someone would write a full program for you to use ? Someone might, but that way you will take a long time to do real perl yourself Best way to use a newsgroup is to write out a code yourself, If you get stuck post what You have done and what is not working You will get enough help here You seem to have a fairly simple thing A short algo will be write a function that will convert a timestamp to a date_range_string like foo('15/Jun/2003:13:54:02')='15/Jun/2003:13:45:00-15/Jun/2003:14:00:00' ... .. # Use a smaller string if you find this range string very long .. Now read the file line by line while(){ ($x,$y,...$timestamp,....) = split; # Fill in the blanks later $hash{foo($timestamp)}++; } Now %hash has got all info you need Best of Luck Ram PS BTW Have a look at analog http://www.analog.cx/ before you write any code yourself. You might find what you want ---------------------------------------------------------------- NETCORE SOLUTIONS *** Ph: +91 22 5662 8000 Fax: +91 22 5662 8134 MailServ: Email, IM, Proxy, Firewall, Anti-Virus, LDAP Fleximail: Mail Storage, Management and Relaying http://netcore.co.in Emergic Freedom: Linux-based Thin Client-Thick Server Computing http://www.emergic.com BlogStreet: Top Blogs, Neighborhoods, Search and Utilities http://www.blogstreet.com Rajesh Jain's Weblog on Technology: http://www.emergic.org ---------------------------------------------------------------- --------------------------------- Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month!