Hi guys,

I was hacking tonight and I got the idea that I should share with you some of my
database performance tricks.  I hope this will be a good reference for when
people are designing their web application architectures.  I will eventually
post this on my website alongside my DBI tutorial.  Maybe Stas will put it in
the guide (wink wink).

Mail suggestions and comments to me at [EMAIL PROTECTED]

Enjoy,
Jeffrey

Analysis of the Problem

A common web application architecture is one or more application servers which
handle requests from client browsers by consulting one or more database servers
and performing a transform on the data.  When an application must consult the
database on every request, the interaction with the database server becomes the
central performance issue.  Spending a bit of time optimizing your database
access can result in significant application performance improvements.  In this
analysis, a system using Apache, mod_perl, DBI, and Oracle will be considered. 
The application server uses Apache and mod_perl to service client requests, and
DBI to communicate with a remote Oracle database.  

In the course of servicing a typical client request, the application server must
retrieve some data from the database and execute a stored procedure.  There are
several steps that need to be done to complete the request:

1: Connect to the database server
2: Prepare a SQL SELECT statement
3: Execute the SELECT statement
4: Retrieve the results of the SELECT statement
5: Release the SELECT statement handle
6: Prepare a PL/SQL stored procedure call
7: Execute the stored procedure
8: Release the stored procedure statement handle
9: Commit or rollback
10: Disconnect from the database server

In this document, an application will be described which achieves maximum
performance by eliminating some of the steps above and optimizing others.

Optimizing Database Connections

A naive implementation would perform steps 1 through 10 from above on every
request.  A portion of the source code might look like this:

# ...
my $dbh = DBI->connect('dbi:Oracle:host', 'user', 'pass')
        || die $DBI::errstr;

my $baz = $r->param('baz');

eval {
        my $sth = $dbh->prepare(qq{
                SELECT foo 
                  FROM bar 
                 WHERE baz = $baz
        });
        $sth->execute;

        while (my @row = $sth->fetchrow_array) {
                # do HTML stuff
        }
        
        $sth->finish;

        my $sph = $dbh->prepare(qq{
                BEGIN
                        my_procedure(
                                arg_in => $baz
                        );
                END;
        });
        $sph->execute;
        $sph->finish;
        
        $dbh->commit;
};
if ($@) {
        $dbh->rollback;
}

$dbh->disconnect;

# ...

In practice, such an implementation would have hideous performance problems. 
The majority of the execution time of this program would likely be spent
connecting to the database.  An examination shows that step 1 is comprised of
many smaller steps:

1: Connect to the database server
1a: Build client-side data structures for an Oracle connection
1b: Look up the server's alias in a file
1c: Look up the server's hostname
1d: Build a socket to the server
1e: Build server-side data structures for this connection

The naive implementation waits for all of these steps to happen, and then throws
away the database connection when it is done!  This is obviously wasteful, and
easily rectified.  The best solution is to hoist the database connection step
out of the per-request lifecycle so that more than one request can use the same
database connection.  This can be done by connecting to the database server
once, and then not disconnecting until the Apache child process exits.  The
Apache::DBI module does this transparently and automatically with little effort
on the part of the programmer.

Apache::DBI intercepts calls to DBI's connect and disconnect methods and
replaces them with its own.  Apache::DBI caches database connections when they
are first opened, and it ignores disconnect commands.  When an application tries
to connect to the same database, Apache::DBI returns a cached connection, thus
saving the significant time penalty of repeatedly connecting to the database.  A
full treatment of Apache::DBI doesn't belong in this document, but you can find
more information in Stas Beckman's mod_perl guide at
http://perl.apache.org/guide/ .

When Apache::DBI is in use, none of the code in the example needs to change. 
The code is upgraded from naive to respectable with the use of a simple module! 
The first and biggest database performance problem is quickly dispensed with.

Utilizing the Database Server's Cache

Most database servers, including Oracle, utilize a cache to improve the
performance of recently seen queries.  The cache is keyed on the SQL statement. 
If a statement is identical to a previously seen statement, the execution plan
for the previous statement is reused.  This can be a considerable improvement
over building a new statement execution plan.

Our respectable implementation from the last section is not making use of this
caching ability.  It is preparing the statement "SELECT foo FROM bar WHERE baz =
$baz".  The problem is that $baz is being read from an HTML form, and is
therefore likely to change on every request.  When the database server sees this
statement, it is going to look like "SELECT foo FROM bar WHERE baz = 1", and on
the next request, the SQL will by "SELECT foo FROM bar WHERE baz = 42".  Since
the statements are different, the database server will not be able to reuse its
execution plan, and will proceed to make another one.  This defeats the purpose
of the SQL statement cache.

The application server needs to make sure that SQL statements which are the same
look the same.  The way to achieve this is to use placeholders and bound
parameters.  The placeholder is a blank in the SQL statement, which tells the
database server that the value will be filled in later.  The bound parameter is
the value which is inserted into the blank before the statement is executed.

With placeholders, the SQL statement looks like "SELECT foo FROM bar WHERE baz =
:baz".  Regardless of whether baz is 1 or 42, the SQL always looks the same, and
the database server can reuse its cached execution plan for this statement. 
This technique has eliminated the execution plan generation penalty from the
per-request runtime.  The potential performance improvement from this
optimization could range from modest to very significant.

Here is the updated code fragment which employs this optimization:

# ...
my $dbh = DBI->connect('dbi:Oracle:host', 'user', 'pass')
        || die $DBI::errstr;

my $baz = $r->param('baz');

eval {
        my $sth = $dbh->prepare(qq{
                SELECT foo 
                  FROM bar 
                 WHERE baz = :baz
        });
        $sth->bind_param(':baz', $baz);
        $sth->execute;

        while (my @row = $sth->fetchrow_array) {
                # do HTML stuff
        }
        
        $sth->finish;

        my $sph = $dbh->prepare(qq{
                BEGIN
                        my_procedure(
                                arg_in => :baz
                        );
                END;
        });
        $sph->bind_param(':baz', $baz);
        $sph->execute;
        $sph->finish;
        
        $dbh->commit;
};
if ($@) {
        $dbh->rollback;
}

# ...

Eliminating SQL Statement Parsing

The example program has certainly come a long way and the performance is now
probably much better than that of the first revision.  However, there is still
more speed that can be wrung out of this server architecture.  The last
bottleneck is in SQL statement parsing.  Every time DBI's prepare method is
called, DBI parses the SQL command looking for placeholder strings, and does
some housekeeping work.  Worse, a context has to be built on the client and
server sides of the connection which the database will use to refer to the
statement.  These things take time, and by eliminating these steps the time can
be saved.

To get rid of the statement handle construction and statement parsing penalties,
we could use DBI's prepare_cached method.  This method compares the SQL
statement to others that have already been executed.  If there is a match, the
cached statement handle is returned.  But the application server is still
spending time calling an object method (very expensive in Perl), and doing a
hash lookup.  Both of these steps are unnecessary, since the SQL is very likely
to be static and known at compile time.  The smart programmer can take advantage
of these two attributes to gain better database performance.  In this example,
the database statements will be prepared immediately after the connection to the
database is made, and they will be cached in package scalars to eliminate the
method call.

What is needed is a routine that will connect to the database and prepare the
statements.  Since the statements are dependent upon the connection, the
integrity of the connection needs to be checked before using the statements, and
a reconnection should be attempted if needed.  Since the routine presented here
does everything that Apache::DBI does, it does not use Apache::DBI and therefore
has the added benefit of eliminating a cache lookup on the connection.

Here is an example of such a package:

package My::DB;

use strict;
use DBI;

sub connect {
        if (defined $My::DB::conn) {
                eval {
                        $My::DB::conn->ping;
                };
                if (!$@) {
                        return $My::DB::conn;
                }
        }

        $My::DB::conn = DBI->connect(
                'dbi:Oracle:server', 'user', 'pass', {
                        PrintError => 1,
                        RaiseError => 1,
                        AutoCommit => 0
                }
        ) || die $DBI::errstr; #Assume application handles this

        $My::DB::select = $My::DB::conn->prepare(q{
                SELECT foo
                  FROM bar
                 WHERE baz = :baz
        });
        
        $My::DB::procedure = $My::DB::conn->prepare(q{
                BEGIN
                        my_procedure(
                                arg_in => :baz
                        );
                END;
        });

        return $My::DB::conn;
}

1;

Now the example program needs to be modified to use this package.

# ...
my $dbh = My::DB->connect;

my $baz = $r->param('baz');

eval {
        my $sth = $My::DB::select;
        $sth->bind_param(':baz', $baz);
        $sth->execute;

        while (my @row = $sth->fetchrow_array) {
                # do HTML stuff
        }

        my $sph = $My::DB::procedure;
        $sph->bind_param(':baz', $baz);
        $sph->execute;
        
        $dbh->commit;
};
if ($@) {
        $dbh->rollback;
}

# ...


Notice that several improvements have been made.  Since the statement handles
have a longer life than the request, there is no need for each request to
prepare the statement, and no need to call the statement handle's finish
method.  Since Apache::DBI and the prepare_cached method are not used, no cache
lookups are needed.

Conclusion

The number of steps needed to service the request in the example system has been
reduced significantly.  In addition, the hidden cost of building and tearing
down statement handles and of creating query execution plans is removed. 
Compare the new sequence with the original:

1: Check connection to database
2: Bind parameter to SQL SELECT statement
3: Execute SELECT statement
4: Fetch rows
5: Bind parameters to PL/SQL stored procedure
6: Execute PL/SQL stored procedure
7: Commit or rollback

It is probably possible to optimize this example even further, but I have not
tried.  It is very likely that the time could be better spent improving your
database indexing scheme or web server buffering and load balancing.  If there
are any suggestions for further optimization of the application-database
interaction, please mail them to me at [EMAIL PROTECTED]

Jeffrey Baker
4 October 1999

Reply via email to