Toddy Prawiraharjo wrote:
>
> From: Rob Dixon [mailto:[EMAIL PROTECTED]
> Sent: Friday, 29 September 2006 10:04 AM
> To: beginners@perl.org
> Subject: Re: Querying very big database
>>
>> Toddy Prawiraharjo wrote:
>>
>>> Hello all,
>>>
>>> I am not sure if i'm inside the correct mailing list, but it should be
>>> simple for you guys. Recently i started a simple perl program to catch all
>>> syslog from my netscreen firewall, and put them into a database, so later
>>> can do some analysing on it (sort of AWStats). the database itself is pretty
>>> big, with 600k entry (for 3 weeks of the running firewall) worth 80 megs in
>>> mysql. My question is, during the analysing my perl script giving very slow
>>> processing. I know the query to mysql itself is pretty quick, less than 2
>>> seconds to return about 40k to 60k result to perl to be analysed
>>>
>>> Here it goes:
>>> <snipped>
>>>        $query = "SELECT src, rcvd FROM no_name WHERE start_date >=
>>> '$fromdate' AND start_date <= '$todate';";
>>>        $sth = $dbh->prepare($query);
>>>        $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
>>>        print "$query<br>";
>>>
>>>        my %src = ();
>>>        my ($totalsent, $totalrcvd, $srcchecked);
>>>
>>>        while(@row = $sth->fetchrow_array){
>>>                $srcchecked = 0;
>>>                while($source, $total = each (%src)){
>>>                        if ($source eq "$row[0]"){
>>>                                $srcchecked = 1;
>>>                                $src{$source} = $src{$source} + $row[1];
>>>                        }
>>>                }
>>>                if ($srcchecked != 1){
>>>                        print "$row[0]<br>";
>>>                        $src{$row[0]} = $row[1];
>>>                }
>>>                $totalrcvd = $totalsent + $row[1];
>>>                #$totalrcvd = $totalrcvd + $row[2];
>>>        }
>>> </snipped>
>>>
>>> The while loop to do analysing on the data take more than 15 minutes, and
>>> that only to a query for 1 day long records($fromdate-$todate)  So, if I
>>> want perl to give me faster result, what's the solution? Did I make
>>> fundamentally wrong approach?
>>
>>
>> You've missed the whole point of hashes, which is that they will access
>> a data
>> value directly from a key. What you've written is similar to
>>
>>    my $i = 0;
>>    my $value;
>>    foreach my $v (@array) {
>>      if ($i++ == $n) {
>>        $value = $v;
>>      }
>>    }
>>
>> instead of
>>
>>    $value = $array[$n];
>>
>> Try this while loop instead of the one you have and see if you get an
>> improvement:
>>
>>    while (@row = $sth->fetchrow_array) {
>>
>>      my $key = $row[0];
>>
>>      print "$key<br>" unless exists $src{$key};
>>      $src{$key} += $row[1];
>>
>>      $totalsent += $row[1];
>>      #$totalrcvd += $row[2];
>>    }
>>
>> HTH,
>>
>> Rob
>>
>
> THX! I always knew it's my n00b scripting that caused the problem. It
> now down to 4 lines inside while loop, and I'm flying! The processing
> down from ~15 minutes to 4 secs! But, for longer date range (2 weeks
> time limit) it clocked at 25 wallclock secs (with processing ~350k mysql
> return entries). Any more way to process this beast any faster? Any good
> reading or reference about this? Just afraid, if this rolled to
> production level, the report for months activities will take minutes to
> make!
>
> <snipped>
>         $query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
>   >= '$fromdate' AND start_date <= '$todate';";
>
>         $sth = $dbh->prepare($query);
>         $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
>         #print "$query<br>";
>
>         my %src = ();
>         my ($totalsent, $totalrcvd);
>
>         while(@row = $sth->fetchrow_array){
>                 my $key = $row[0];
>                 $src{$key} += $row[2];
>                 $totalsent = $totalsent + $row[1];
>                 $totalrcvd = $totalrcvd + $row[2];
>         }
> </snipped>

Hi Toddy

(Please bottom-post your replies. One day everybody in the world will know to do
this and perl.beginners will be a happy, shiny place!)

Let the database engine do it for you!

  my $query = qq(
    SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd
    FROM no_name
    WHERE start_date >= '$fromdate' AND start_date <= '$todate'
    GROUP BY src
  );

  my $sth = $dbh->prepare($query);
  $sth->execute() || errorhere("SQL Error: $DBI::errstr");

  my $src = $sth->fetchall_hashref('src');

  foreach my $srcval(keys %$src) {
    $totalsent += $src->{$srcval}{totalsent};
    $totalrcvd += $src->{$srcval}{totalrcvd};
  }


(I couldn't easily test this, although it is syntax checked, so beware)

Now, $src is a reference to a hash similar to your original %src, but has
another hash reference as its values, so you can extract

  my $totalsent = $src{src1}{totalsent};

and so on.

HTH,

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to