> In the script, there are two variables produced from a regexp match 
> against the current line in the access_log:
> 
> $new_ip
> $user_agent
> 
> and one that is determined by a subroutine that looks for clues that 
> the user_agent is a robot:
> 
> $user_type
> 
> This is the subroutine that builds the hash of hashes. It compares the 
> $new_ip in the current line of the access_log with IPs already in the 
> hash and updates the user_type/user_agent info if it finds a match and 
> creates a new entry if it doesn't.
> 
> sub build_user_hash {
>      foreach $existing_ip (%user_hash) {
>          if ($existing_ip eq $new_ip ) {
>                       $user_hash {$existing_ip} {user_type} = $user_type;
>                       $user_hash {$existing_ip} {user_agent} = $user_agent;
>                       return;
>          }
>      }
>       $user_hash {$new_ip} {user_type} = $user_type;
>       $user_hash {$new_ip} {user_agent} = $user_agent;
>       $unique_ip_count++;
> }
> 

Okay -- now that I have some code to look at, I can give you better advice:

1)  use strict;

2)  REALLY ... We Mean It.  ***USE STRICT***

3)  global variables are for 12 year old VB programmers.  Avoid them
    unless absolutely necessary.

Here's a slightly cleaned up version of your program, with some data
to test with ... 

N.B. That you don't want to iterate across the entire %user_hash in
the foreach -- you want to run once for each key, not once for the key
and again for the value ... It turns out, that in a scalar context the
value of each of these keys is a string of the form HASH(0x11fa80).
This string is unlikely to colllide with your actual IP address, and
therefore won't harm your script, but it's Bad Style.  

.................................. BEGIN PERL PROGRAM ..................
#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper; 

our %user_hash; 
our $unique_ip_count; 

sub build_user_hash {
    my ($new_ip, $user_agent, $user_type) = @_ ; 
    foreach my $existing_ip ( keys %user_hash) {
        if ($existing_ip eq $new_ip ) {
            $user_hash {$existing_ip} {user_type} = $user_type;
            $user_hash {$existing_ip} {user_agent} = $user_agent;
            return;
        }
    }
    $user_hash {$new_ip} {user_type} = $user_type;
    $user_hash {$new_ip} {user_agent} = $user_agent;
    $unique_ip_count++;
}

# populate the hash with some fake data 

while (<DATA>) {
    if (my ($ip, $agent, $type ) = m/^(.*) - (.*) - (.*)$/) {
        build_user_hash($ip, $agent, $type) ;
    }
}

print  Dumper \%user_hash; 

print "Here come da judge ... \n"; 

foreach my $key ( keys %user_hash ) {
    print "$key \"$user_hash{$key}{user_agent}\"\n"
        if $user_hash{$key}{user_type} eq 'robot';
}


__DATA__
192.168.1.1 - ...TOPS-20...EMACS... - human
192.168.1.1 - ...Linux...Opera... - human
192.168.1.3 - ...Yahoo!Slurp... - robot
192.168.1.5 - ...Windows...MSIE... - human
192.168.1.7 - ...Solaris...Mozilla... - human
192.168.1.5 - ...GNU...LWP/0.1... - robot

................................... END PERL PROGRAM ...................

Producing the output:


$VAR1 = {
          '192.168.1.1' => {
                             'user_type' => 'human',
                             'user_agent' => '...Linux...Opera...'
                           },
          '192.168.1.3' => {
                             'user_type' => 'robot',
                             'user_agent' => '...Yahoo!Slurp...'
                           },
          '192.168.1.5' => {
                             'user_type' => 'robot',
                             'user_agent' => '...GNU...LWP/0.1...'
                           },
          '192.168.1.7' => {
                             'user_type' => 'human',
                             'user_agent' => '...Solaris...Mozilla...'
                           }
        };
Here come da judge ... 
192.168.1.3 "...Yahoo!Slurp..."
192.168.1.5 "...GNU...LWP/0.1..."

........................................................................

So -- this program does in fact work, but may not do what you want.

For example: Is using the IP address as the key really a good idea?
Might an IP address be assigned to two different users -- or might a
person use two different user agents on one host?  Do you want to deal
with that?  Do you want the per-host to be a LIST of browser/flavor
pairs?  

Consider the following structure:

our %users = ( '192.168.1.1' => [ { user_type => 'human',
                                    user_agent => '...Browser...' },
                                  { user_type => 'robot',
                                    user_agent => '...Some Robot...' } ],
               '192.168.1.2' => [ { user_type => 'robot',
                                    user_agent => '...Another Robot...' } ],
               '192.168.1.3' => [ { user_type => 'human',
                                    user_agent => '...Browser X...' },
                                  { user_type => 'human',
                                    user_agent => '...Browser Y...' } ] ); 


How do you think you'd create that and iterate across it...?

Your homework:  Starting with that structure above, write a subroutine
that will iterate across it producing the following output:

Hint - it will definitely be easier to explicitly break the structure
into lists and hashes at first.  Then squeeze your code to eliminate
the intermediate variables ... 

foreach my $key (keys %user_hash ) {
    my @list = @{$user_hash{$key}}; 
    .
    .
    .
}

lawrence  /tmp > perl /tmp/ziz.pl
A human at 192.168.1.1 used ...Browser...
A robot at 192.168.1.1 used ...Some Robot...
A human at 192.168.1.3 used ...Browser X...
A human at 192.168.1.3 used ...Browser Y...
A robot at 192.168.1.2 used ...Another Robot...

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
        Lawrence Statton - [EMAIL PROTECTED] s/aba/c/g
Computer  software  consists of  only  two  components: ones  and
zeros, in roughly equal proportions.   All that is required is to
sort them into the correct order.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to