> In the script, there are two variables produced from a regexp match > against the current line in the access_log: > > $new_ip > $user_agent > > and one that is determined by a subroutine that looks for clues that > the user_agent is a robot: > > $user_type > > This is the subroutine that builds the hash of hashes. It compares the > $new_ip in the current line of the access_log with IPs already in the > hash and updates the user_type/user_agent info if it finds a match and > creates a new entry if it doesn't. > > sub build_user_hash { > foreach $existing_ip (%user_hash) { > if ($existing_ip eq $new_ip ) { > $user_hash {$existing_ip} {user_type} = $user_type; > $user_hash {$existing_ip} {user_agent} = $user_agent; > return; > } > } > $user_hash {$new_ip} {user_type} = $user_type; > $user_hash {$new_ip} {user_agent} = $user_agent; > $unique_ip_count++; > } >
Okay -- now that I have some code to look at, I can give you better advice: 1) use strict; 2) REALLY ... We Mean It. ***USE STRICT*** 3) global variables are for 12 year old VB programmers. Avoid them unless absolutely necessary. Here's a slightly cleaned up version of your program, with some data to test with ... N.B. That you don't want to iterate across the entire %user_hash in the foreach -- you want to run once for each key, not once for the key and again for the value ... It turns out, that in a scalar context the value of each of these keys is a string of the form HASH(0x11fa80). This string is unlikely to colllide with your actual IP address, and therefore won't harm your script, but it's Bad Style. .................................. BEGIN PERL PROGRAM .................. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; our %user_hash; our $unique_ip_count; sub build_user_hash { my ($new_ip, $user_agent, $user_type) = @_ ; foreach my $existing_ip ( keys %user_hash) { if ($existing_ip eq $new_ip ) { $user_hash {$existing_ip} {user_type} = $user_type; $user_hash {$existing_ip} {user_agent} = $user_agent; return; } } $user_hash {$new_ip} {user_type} = $user_type; $user_hash {$new_ip} {user_agent} = $user_agent; $unique_ip_count++; } # populate the hash with some fake data while (<DATA>) { if (my ($ip, $agent, $type ) = m/^(.*) - (.*) - (.*)$/) { build_user_hash($ip, $agent, $type) ; } } print Dumper \%user_hash; print "Here come da judge ... \n"; foreach my $key ( keys %user_hash ) { print "$key \"$user_hash{$key}{user_agent}\"\n" if $user_hash{$key}{user_type} eq 'robot'; } __DATA__ 192.168.1.1 - ...TOPS-20...EMACS... - human 192.168.1.1 - ...Linux...Opera... - human 192.168.1.3 - ...Yahoo!Slurp... - robot 192.168.1.5 - ...Windows...MSIE... - human 192.168.1.7 - ...Solaris...Mozilla... - human 192.168.1.5 - ...GNU...LWP/0.1... - robot ................................... END PERL PROGRAM ................... Producing the output: $VAR1 = { '192.168.1.1' => { 'user_type' => 'human', 'user_agent' => '...Linux...Opera...' }, '192.168.1.3' => { 'user_type' => 'robot', 'user_agent' => '...Yahoo!Slurp...' }, '192.168.1.5' => { 'user_type' => 'robot', 'user_agent' => '...GNU...LWP/0.1...' }, '192.168.1.7' => { 'user_type' => 'human', 'user_agent' => '...Solaris...Mozilla...' } }; Here come da judge ... 192.168.1.3 "...Yahoo!Slurp..." 192.168.1.5 "...GNU...LWP/0.1..." ........................................................................ So -- this program does in fact work, but may not do what you want. For example: Is using the IP address as the key really a good idea? Might an IP address be assigned to two different users -- or might a person use two different user agents on one host? Do you want to deal with that? Do you want the per-host to be a LIST of browser/flavor pairs? Consider the following structure: our %users = ( '192.168.1.1' => [ { user_type => 'human', user_agent => '...Browser...' }, { user_type => 'robot', user_agent => '...Some Robot...' } ], '192.168.1.2' => [ { user_type => 'robot', user_agent => '...Another Robot...' } ], '192.168.1.3' => [ { user_type => 'human', user_agent => '...Browser X...' }, { user_type => 'human', user_agent => '...Browser Y...' } ] ); How do you think you'd create that and iterate across it...? Your homework: Starting with that structure above, write a subroutine that will iterate across it producing the following output: Hint - it will definitely be easier to explicitly break the structure into lists and hashes at first. Then squeeze your code to eliminate the intermediate variables ... foreach my $key (keys %user_hash ) { my @list = @{$user_hash{$key}}; . . . } lawrence /tmp > perl /tmp/ziz.pl A human at 192.168.1.1 used ...Browser... A robot at 192.168.1.1 used ...Some Robot... A human at 192.168.1.3 used ...Browser X... A human at 192.168.1.3 used ...Browser Y... A robot at 192.168.1.2 used ...Another Robot... -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Lawrence Statton - [EMAIL PROTECTED] s/aba/c/g Computer software consists of only two components: ones and zeros, in roughly equal proportions. All that is required is to sort them into the correct order. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>