Dear dbi-users,

I'm pretty sure this is a bug -- the question is whether it is in my
code or somewhere in DBI or DBD::Pg.

I have the following code (platform details in attached file, but,
roughly, perl 5.6.1 on a (reasonably aggressively updated) mandrake 8.1
with the kernel at  2.4.8-26 on a 1.3 GHz Intel Celery with 256MB RAM,
postgres 7.2, DBI 1.30, DBD::Pg 1.13):

#!/usr/bin/perl -w
use DBI;
use Modules::ParseQS;


use constant SQL_CRTEMP => 'create temporary table llist as select
distinct on (host_name,pid,line) host_name,time_stamp,pid,line from
pganalysis where data like \'DEBUG:  QUERY ST%\'';

use constant SQL_FETEMP => 'select
llist.host_name,llist.pid,llist.line,pganalysis.sequence,llist.time_stamp,pganalysis.data
 from llist left outer join pganalysis using (host_name,pid,line) order by 
host_name,pid,line,sequence';

use constant PG_DBNAME  => "pganalysis";
use constant PG_DSOURCE => "dbi:Pg:dbname=".PG_DBNAME;
use constant PG_TABLE   => "psummary";
use constant PG_USER    => "mike";


sub CallBack {
  #
  # This sub is called by the parser when it reduces the "querystat"
  # rule.  It inserts the extracted values into the summary table
  #
  my $field_values = shift;
  my $dbh = shift;
  
  my @fields = sort keys %$field_values; 
  my @values = @{$field_values}{@fields};

  my $sql = sprintf "insert into %s (%s) values (%s)",
    PG_TABLE, join(",", @fields), join(",", ("?")x@fields);

  my $sth = $dbh->prepare_cached($sql);
  $sth->execute(@values);
}

#
# Set up the database handle
#

($dbh = DBI->connect(PG_DSOURCE,PG_USER)) 
  || die "Couldn't connect $dbh->errstr";

#
# Create the temporary table
#       
(defined ($rv = $dbh->do(SQL_CRTEMP)))
  || die "Failed to execute initial sql: ".$dbh->errstr;

#
# Prepare the statement handle for the main query
#
(defined ($sth = $dbh->prepare(SQL_FETEMP)))
  || die "Failed to prepare main sql: ". $dbh->errstr;

#
# Run the main query !!! DIES HERE !!!
#       
(defined ($rv = $sth->execute()))
  || die "Failed to execute main sql: ". $dbh->errstr;


#
# Create an instance of the parser
#
my $parser = ParseQS->new();

#
# Parse the query results
#
$parser->Run($sth,\&CallBack,$dbh);

$dbh->commit;
$dbh->disconnect;

The "execute" seems to be returning all the rows into the address space
of my program.  Since the result set is ~4 million rows, this has
diabolical consequences on memory and, in fact, the program gets whacked
by the kernel once it has exhausted memory (and caused my machine to
thrash wildly for quite some time...).  I've run this under the perl
debugger, and it dies before it gets to the fetchrow_arrayref contained
in the ParseQS module -- it never returns from the "execute".

So, where's the bug?  If it is in my code, I'd be grateful for tips on
the correct way to fetch big data.

Attached are the original Perl program (in case the listing above
becomes unreadable due to line mangulation), and some collected
information regarding platform and an analysis of the query as run by
the program.

Thanks very much.

Regards,

Mike

#!/usr/bin/perl -w
use DBI;
use Modules::ParseQS;


use constant SQL_CRTEMP => 'create temporary table llist as select distinct on (host_name,pid,line) host_name,time_stamp,pid,line from pganalysis where data like \'DEBUG:  QUERY ST%\'';

use constant SQL_FETEMP => 'select llist.host_name,llist.pid,llist.line,pganalysis.sequence,llist.time_stamp,pganalysis.data from llist left outer join pganalysis using (host_name,pid,line) order by host_name,pid,line,sequence';

use constant PG_DBNAME  => "pganalysis";
use constant PG_DSOURCE => "dbi:Pg:dbname=".PG_DBNAME;
use constant PG_TABLE   => "psummary";
use constant PG_USER    => "mike";


sub CallBack {
  #
  # This sub is called by the parser when it reduces the "querystat"
  # rule.  It inserts the extracted values into the summary table
  #
  my $field_values = shift;
  my $dbh = shift;
  
  my @fields = sort keys %$field_values; 
  my @values = @{$field_values}{@fields};

  my $sql = sprintf "insert into %s (%s) values (%s)",
    PG_TABLE, join(",", @fields), join(",", ("?")x@fields);

  my $sth = $dbh->prepare_cached($sql);
  $sth->execute(@values);
}

#
# Set up the database handle
#

($dbh = DBI->connect(PG_DSOURCE,PG_USER)) 
  || die "Couldn't connect $dbh->errstr";

#
# Create the temporary table
#	
(defined ($rv = $dbh->do(SQL_CRTEMP)))
  || die "Failed to execute initial sql: ".$dbh->errstr;

#
# Prepare the statement handle for the main query
#
(defined ($sth = $dbh->prepare(SQL_FETEMP)))
  || die "Failed to prepare main sql: ". $dbh->errstr;

#
# Run the main query
#	
(defined ($rv = $sth->execute()))
  || die "Failed to execute main sql: ". $dbh->errstr;


#
# Create an instance of the parser
#
my $parser = ParseQS->new();

#
# Parse the query results
#
$parser->Run($sth,\&CallBack,$dbh);

$dbh->commit;
$dbh->disconnect;
[mike@CPE-144-132-182-155 Code]$ perl -v

This is perl, v5.6.1 built for i386-linux

Copyright 1987-2001, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

[mike@CPE-144-132-182-155 Code]$ perl -V

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
  Platform:
    osname=linux, osvers=2.4.8-11mdkenterprise, archname=i386-linux
    uname='linux no.mandrakesoft.com 2.4.8-11mdkenterprise #1 smp wed aug 22 16:05:18 
cest 2001 i686 unknown '
    config_args='-des -Darchname=i386-linux -Dd_dosuid -Ud_csh -Duseshrplib 
-Doptimize=-O3 -fomit-frame-pointer -pipe -mcpu=pentiumpro -march=i586 -ffast-math 
-fno-strength-reduce -Dprefix=/usr -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog 
-Uuselargefiles -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/lib/perl5/man/man3'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing',
    optimize='-O3 -fomit-frame-pointer -pipe -mcpu=pentiumpro -march=i586 -ffast-math 
-fno-strength-reduce',
    cppflags='-fno-strict-aliasing'
    ccversion='', gccversion='2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)', 
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/libc-2.2.4.so, so=so, useshrplib=true, libperl=libperl.so
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic 
-Wl,-rpath,/usr/lib/perl5/5.6.1/i386-linux/CORE'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl): 
  Compile-time options:
  Built under linux
  Compiled at Sep  9 2001 23:58:45
  @INC:
    /usr/lib/perl5/5.6.1/i386-linux
    /usr/lib/perl5/5.6.1
    /usr/lib/perl5/site_perl/5.6.1/i386-linux
    /usr/lib/perl5/site_perl/5.6.1
    /usr/lib/perl5/site_perl
    .
psql (PostgreSQL) 7.2
contains support for: readline, history, multibyte
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996, Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.
Linux CPE-144-132-182-155 2.4.8-26mdk #1 Sun Sep 23 17:06:39 CEST 2001 i686 unknown
psql (PostgreSQL) 7.2
contains support for: readline, history, multibyte
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996, Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.
[mike@CPE-144-132-182-155 mike]$ uname -a
Linux CPE-144-132-182-155 2.4.8-26mdk #1 Sun Sep 23 17:06:39 CEST 2001 i686 unknown

[mike@CPE-144-132-182-155 Code]$ psql -q pganalysis

pganalysis=> create temporary table llist as select distinct on (host_name,pid,line) 
host_name,time_stamp,pid,line from pganalysis where data like 'DEBUG:  QUERY ST%';
pganalysis=> explain analyze select 
llist.host_name,llist.pid,llist.line,pganalysis.sequence,llist.time_stamp,pganalysis.data
 from llist left outer join pganalysis using (host_name,pid,line) order by 
host_name,pid,line,sequence;
NOTICE:  QUERY PLAN:

Sort  (cost=176.66..176.66 rows=1000 width=463) (actual time=503304..22..517880.09 
rows=3954509 loops=1)
  ->  Merge Join  (cost=69.83..126.83 rows=1000 width=463) (actual 
time=16217.80..242742.86 rows=3954509 loops=1)
        ->  Index Scan using pganalysis_pkey on pganalysis  (cost=0.00..52.00 
rows=1000 width=302) (actual time=54.65..137449.79 rows=4621267 loops=1)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=161) (actual 
time=16162.98..37159.75 rows=3954498 loops=1)
              ->  Seq Scan on llist  (cost=0.00..20.00 rows=1000 width=161) (actual 
time=0.07..1109.88 rows=246702 loops=1)
Total runtime: 520393.01 msec

Reply via email to