[RFC: performance] Initializing DBI.pm

Stas Bekman Fri, 02 Jun 2000 17:54:39 -0700
Here is a complete version. comments are very welcome before it enters the
guide:

The first example is the C<DBI> module. As you know C<DBI> works with
many database drivers falling into the C<DBD::> category,
e.g. C<DBD::mysql>. It's not enough to preload C<DBI>, you should
initialize C<DBI> with driver(s) that you are going to use (usually a
single driver is used).

You probably know already that under mod_perl you should use the
C<Apache::DBI> module to get the connection persistence, unless you
open a separate connection for each user--in this case you should not
use this module. C<Apache::DBI> automatically loads C<DBI> and
overrides all it's methods, so you should continue coding like there
is only a C<DBI> module.

Just as with modules preloading our goal is to find the startup
environment that will lead to the smallest I<"difference"> between the
shared and normal memory reported, therefore a smaller total memory
usage.

And again in order to have an easy measurement we will use only one
child process, therefore we will use this setting in I<httpd.conf>:

  MinSpareServers 1
  MaxSpareServers 1
  StartServers 1
  MaxClients 1
  MaxRequestsPerChild 100

We are going to run memory benchmarks on five different versions of
the I<startup.pl> file.  We always preload these modules:

  use Gtop();
  use Apache::DBI(); # preloads DBI as well

=over

=item option 1

Leave the file unmodified.

=item option 2

Install MySQL driver (we will use MySQL RDBMS for our test):

  DBI->install_driver("mysql");

=item option 3

Preload MySQL driver module:

  use DBD::mysql;

=item option 4

Tell Apache::DBI to connect to the database when the child process
starts (ChildInitHandler), no driver is preload before the child gets
spawned!

  Apache::DBI->connect_on_init('DBI:mysql:test::localhost',
                             "",
                             "",
                             {
                              PrintError => 1, # warn() on errors
                              RaiseError => 0, # don't die on error
                              AutoCommit => 1, # commit executes
                              # immediately
                             }
                            )
  or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");

=back

Here is the C<Apache::Registry> test script that we have used:

  preload_dbi.pl
  --------------
  use strict;
  use GTop ();
  use DBI ();
    
  my $dbh = DBI->connect("DBI:mysql:test::localhost",
                         "",
                         "",
                         {
                          PrintError => 1, # warn() on errors
                          RaiseError => 0, # don't die on error
                          AutoCommit => 1, # commit executes
                                           # immediately
                         }
                        )
    or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
  
  my $r = shift;
  $r->send_http_header('text/plain');
  
  my $do_sql = "show tables";
  my $sth = $dbh->prepare($do_sql);
  $sth->execute();
  my @data = ();
  while (my @row = $sth->fetchrow_array){
    push @data, @row;
  }
  print "Data: @data\n";
  $dbh->disconnect(); # NOP under Apache::DBI
  
  my $proc_mem = GTop->new->proc_mem($$);
  my $size  = $proc_mem->size;
  my $share = $proc_mem->share;
  my $diff  = $size - $share;
  printf "%8s %8s %8s\n", qw(Size Shared Diff);
  printf "%8d %8d %8d (bytes)\n",$size,$share,$diff;

What it does is opening a connection to the database I<'test'> and
issues a query to learn what tables the databases has.  When the data
is collected and printed the connection would be closed in the regular
case, but C<Apache::DBI> overrides it with empty method.  When the
data is processed a familiar to you already code to print the memory
usage follows.

The server was restarted before each new test.

So here are the results of the five tests that were conducted, sorted
by the I<Diff> column:

=over

=item 1

After the first request:

  Version     Size   Shared     Diff        Test type
  --------------------------------------------------------------------
        1  3465216  2621440   843776  install_driver
        2  3461120  2609152   851968  install_driver & connect_on_init
        3  3465216  2605056   860160  preload driver
        4  3461120  2494464   966656  nothing added
        5  3461120  2482176   978944  connect_on_init

=item 2

After the second request (all the subsequent request showed the same
results):

  Version     Size   Shared    Diff         Test type
  --------------------------------------------------------------------
        1  3469312  2609152   860160  install_driver
        2  3481600  2605056   876544  install_driver & connect_on_init
        3  3469312  2588672   880640  preload driver
        4  3477504  2482176   995328  nothing added
        5  3481600  2469888  1011712  connect_on_init

=back

Now what do we conclude from looking at these numbers. First we see
that only after a second reload we get the final memory footprint for
a specific request in question (if you pass different arguments the
memory usage might and will be different).

But both tables show the same pattern of memory usage.  We can clearly
see that the real winner is the I<startup.pl> file's version where the
MySQL driver was installed (1).  Since we want to have a connection
ready for the first request made to the freshly spawned child process,
we generally use the second version (2) which uses somewhat more
memory, but has almost the same number of shared memory pages.  The
third version only preloads the driver which results in smaller shared
memory.  The last two versions having nothing initialized (4) and
having only the connect_on_init() method used (5).  The former is a
little bit better than the latter, but both significantly worse than
the first two versions.

To remind you why do we look for the smallest value in the column
I<diff>, recall the real memory usage formula:

  RAM_dedicated_to_mod_perl = diff * number_of_processes
                            + the_processes_with_largest_shared_memory

Notice that the the smaller the diff is, the bigger the number of
processes you can have using the same amount of RAM.  Therefore every
100K difference counts, when you multiply it by the number of
processes. If we take the number from the version version (1) vs (4)
and assume that we have 256M of memory dedicated to mod_perl processes
we will get the following numbers using the formula derived from the
above formula:

               RAM - largest_shared_size
  N_of Procs = -------------------------
                        Diff

                268435456 - 2609152
  (ver 1)  N =  ------------------- = 309
                      860160

                268435456 - 2469888
  (ver 5)  N =  ------------------- = 262
                     1011712

So you can tell the difference (17% more child processes in the first
version).


_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org     http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
[RFC: performance] Initializing DBI.pm

Reply via email to