DBI Roadmap

Tim Bunce Mon, 09 Aug 2004 16:14:50 -0700

Comments welcome.


=head1 DBI ROAD-MAP

9th August 2004

This document aims to provide a high level overview of the future direction of the DBI.

It outlines the broad categories of changes, along with some rational,
but does not go into implementation details and ignores many more
minor planned enhancements.  More details can be found in:

  http://svn.perl.org/modules/dbi/trunk/ToDo

(username guest, password guest)


=head2 Unicode

Use of Unicode with the DBI is growing rapidly. The DBI could do more
to help drivers support Unicode and help applications work with drivers
that don't yet support Unicode directly.

* Define expected behavior for fetching data and binding parameters.

* Fix 'leaking' of UTF8 flag from one row to the next.

* Provide interfaces to support Unicode issues for XS and pure Perl drivers
and applications.


=head2 Testing

The DBI has a test suite. Every driver has a test suite.  Each is
limited in its scope.  The driver test suite is testing for behavior
that matches what the driver author thinks the DBI specifies but
may be subtly incorrect.  These test suites are poorly maintained
because the development cost is relatively high compared to the
"return" from a single driver.

A common test suite that can be reused by all the drivers is needed.
The benefits include:

* Ensuring all drivers conform to the DBI specification.
Easing the porting applications between databases and the
implementation of database independent reusable code modules
layered over the DBI.

* Improving the coverage of the DBI and driver code tested by the
test suite.  Driver authors and others will be more motivated to
contribute to the common test suite as the gains are multiplied by
the number of drivers in use.

* Improving the DBI specification by prompting the clarification
of fuzzy issues in order to implement test cases.

* Automatic documentation about driver functionality can be generated
by the testing process.  Areas of missing functionality can be
highlighted to encourage enhancements.

* Improve the testing of DBI subclassing, DBI::PurePerl and the
various "plumbing" drivers, such as DBD::Proxy and DBD::Multiplex,
by automatically running the test suite through them.


=head2 Performance

The DBI has always treated performance as a priority. However some
parts of the implementation remain unoptimized, especially in
relation to threads.

* The mechanism by which drivers access the core "DBI State" structure
(DBIS) is very inefficient when perl is built to support threads
(incl mod_perl 2).

* The PERL_NO_GET_CONTEXT mechanism is not used by the DBI or drivers
so their use of Perl API functions is significantly less efficient.

* The majority of the handle creation code, including TIEHASH, is
implemented in Perl.  Moving most of this to C will speed up handle
creation significantly.

* The popular fetchrow_hashref() method is many times slower than
fetchrow_arrayref() because it has to re-get the names of the fields
each time. A $h->{FetchHashReuse} attribute would allow the same
hash to be reused each time making fetchrow_hashref() about the
same speed as fetchrow_arrayref().


=head2 Introspection

* The methods of the DBI API are installed dynamically when the DBI
is loaded.  The data structure used to define the methods and their
dispatch behavior should be made part of the DBI API. This would
enable more flexible and correct behavior by both modules subclassing
the DBI and especially dynamic drivers such as DBD::Proxy and
DBD::Multiplex.

* All the handle attributes and related 'metadata' should also be
made available for the same reasons. It's common for DBD::Proxy,
for example, to not treat new attributes correctly because it's not
been taught about them.

* Currently is it not possible to discover all the child statement
handles that belong to a database handle (or all database handles
that belong to a driver handle).  This makes certain tasks more
difficult, especially some debugging scenarios.  A cache of
weak-references to child handles would solve the problem without
creating reference loops.

* A DBI handle is a reference to a tied hash and so has an 'outer'
hash that the handle reference points to and an 'inner' hash holding
the DBI data.  By allowing the inner handle to be changed, for
example swapped with a different handle, many new behaviors become
possible. For example a database handle to a database that's crashed
could have it's inner handle changed to a new connection to a replica.

* It is often useful to know what handle attributes have been changed
since the handle was created (e.g., in mod_perl where a handle needs
to be reset or cloned). This will become more important as developers
start exploring the ability to change the inner handle.


=head2 High Availability and Load Balancing

* The DBD::Multiplex driver is intended to enable a wide range of
dynamic functionality including support for various high-availability
and load-balancing scenarios.  The old version has been used
successfully but was limited. It's being rewritten to greatly
increase its flexibility and has great potential, but development
has stalled.

* The DBD::Proxy module is complex and relatively inefficient because
it's trying to be a complete proxy for most DBI method calls at
both the database handle and statement handle levels.  For many
applications a simpler proxy architecture that operates with a
single round-trip to the server would be sufficient (result rows
of SELECT statements would be serialized into the response).

Apart from efficiency gains that would also enable the use of
stateless servers which then enables the use of a pool of servers
for high-availability and load balancing.

I envisage a driver base class that implements everything except
the 'transport' mechanism and then multiple drivers using the base
class with specific transports.  For example, one such transport
could be the Spread::Queue module.


=head2 Extensibility

The DBI can be extended in three main dimensions: subclassing the
DBI, subclassing a driver, and callback hooks. Each has different
pros and cons and each is most applicable in different situations.

* Subclassing the DBI is functional but not well defined and some
key elements are incomplete, particularly the DbTypeSubclass mechanism
(that automatically subclasses to a class tree corresponding to the
type of database being used).  It also needs more thorough testing.

* Subclassing a driver is undocumented, poorly tested and very
probably incomplete. However it's a powerful way to embed certain
kinds of functionality 'below' applications while avoiding some of
the side-effects of subclasing the DBI (especially in relation to
error handling).

* Callbacks are currently limited to error handling (the HandleError
and HandleSetError attributes).  Providing callback hooks for more
events, such as a row being fetched, would enable utility modules,
for example, to add functionality independent of any subclassing
in use.


=head2 Database Portability

* The DBI has not yet addressed the issue of portability among SQL
dialects.  This is the main hurdle in the way of database portability
for the DBI.

The goal is not to fully parse the SQL and rewrite it in a different
dialect.  That's well beyond the scope of the DBI and should be
left to layered modules.  However, a simple token rewriting mechanism
for five comment styles, two quoting styles, four placeholder styles,
plus the ODBC "{foo ...}" escape syntax is sufficient to significantly
raise the level of SQL portability.

* Another major problem area is date/time formatting.  Since version 1.41
the DBI has defined a way to express that dates should be fetched
in SQL standard date format (YYYY-MM-DD).  However it requires the
bind_col() method to be called on applicable columns.  This is one
example of the more general case where bind_col() needs to be called
with particular attributes on all columns of a particular type.

A mechanism is needed whereby an application can specify default bind_col()
attributes for each column type. So with a single step all DATE type
columns, for example, can be set to be returned in the standard format.


=head2 Debug-ability

* Reduce the "noise" when the trace level is set high by moving more trace
output to be enabled by the new named-topic trace mechanism.

* Calls to XS functions (such as many DBI and driver methods) don't
normally appear in the call stack.  Optionally enabling that would
enable more useful diagnostics to be produced.

* Integration with the Perl debugger would make it simpler to perform
actions on a per-handle basis (such as breakpoint on execute,
breakpoint on error).


=head2 Other Enhancements

* Support non-blocking mode for drivers that can enable it in their
client API.

* Scroll-cursor support


=head2 Parrot and Perl 6

The current DBI implementation in C code is very unlikely to run
on Perl 6.  Perl 6 will target the Parrot virtual machine and so
the internal architecture will be radically different from Perl 5.

The most natural language to implement Perl 6 extensions will be
Parrot Intermediate Representation (PIR). Since Parrot includes a
Native Call Interface, extensions implemented in PIR should not
need a compiler in order to interface to database client API shared
libraries.

It is a goal of the Parrot project to be a suitable target for many
dynamic languages (including Python, PHP, Ruby, etc) and to enable
those languages to reuses each others modules. So a database interface
for Parrot is also a database interface for all those languages.

The Perl DBI is more mature and featureful than the database
interfaces of the other languages and so would make an excellent
base for the Parrot Database interface.

My plan is to better define the API between the DBI and the drivers and
use that API as the primary API for the 'raw' Parrot database interface.
This project is known a Parrot DBDI.  Here's my announcement:

  http://groups.google.com/[EMAIL PROTECTED]

(The project stalled, due to Parrot not having key functionality
at the time, and has yet to be restarted.)

The bulk of the DBI code actually exists in base classes 'behind'
the driver API.  The method dispatcher code that Perl applications
interface with is relatively small.

Each language targeting Parrot would implement their own small
language-specific dispatcher over the Parrot DBDI interface.

A "big win" here is that a much wider community of developers share
the same database drivers and so the benefits of the Open Source
model are magnified.

The bulk of the work will be translating the C and Perl base class
code into Parrot PIR or a suitable language that generates PIR.


=head1 PRIORITIES

The foundations of many of the changes described above require
changes to the interface between the DBI and drivers. To clearly
define the transition point the source code will be forked into a
DBI v1 branch and the mainline bumped to v2.

DBI v1 will continue to be maintained for bug fixes and any
enhancements that ease the transition to DBI v2.

=head2 Transition Drivers

The first priority is to make all the infrastructure changes that
impact drivers and make an alpha release available that driver
authors can target.  As far as possible the changes will be implemented
in a way that enables driver authors use the same code base for DBI
v1 and DBI v2.

The main changes required by driver authors are:

* Code changes for PERL_NO_GET_CONTEXT, plus removing PERL_POLLUTE
and DBIS

* Code changes in DBI/DBD interface (new way to create handles, new
callbacks etc)

* Common test suite infrastructure (driver-specific test base class)

=head2 Transition Applications

At the same time a small set of incompatible changes that may impact
some applications will also be made. See
http://svn.perl.org/modules/dbi/trunk/ToDo (login guest/guest).

=head2 Incremental Developments

Once DBI v2.0 is available the other enhancements can be implemented
incrementally on the updated foundations. The priorities of those
changes can be set in the light of then present circumstances.

=cut

DBI Roadmap

Reply via email to