Comments welcome.
=head1 DBI ROAD-MAP 9th August 2004 This document aims to provide a high level overview of the future direction of the DBI. It outlines the broad categories of changes, along with some rational, but does not go into implementation details and ignores many more minor planned enhancements. More details can be found in: http://svn.perl.org/modules/dbi/trunk/ToDo (username guest, password guest) =head2 Unicode Use of Unicode with the DBI is growing rapidly. The DBI could do more to help drivers support Unicode and help applications work with drivers that don't yet support Unicode directly. * Define expected behavior for fetching data and binding parameters. * Fix 'leaking' of UTF8 flag from one row to the next. * Provide interfaces to support Unicode issues for XS and pure Perl drivers and applications. =head2 Testing The DBI has a test suite. Every driver has a test suite. Each is limited in its scope. The driver test suite is testing for behavior that matches what the driver author thinks the DBI specifies but may be subtly incorrect. These test suites are poorly maintained because the development cost is relatively high compared to the "return" from a single driver. A common test suite that can be reused by all the drivers is needed. The benefits include: * Ensuring all drivers conform to the DBI specification. Easing the porting applications between databases and the implementation of database independent reusable code modules layered over the DBI. * Improving the coverage of the DBI and driver code tested by the test suite. Driver authors and others will be more motivated to contribute to the common test suite as the gains are multiplied by the number of drivers in use. * Improving the DBI specification by prompting the clarification of fuzzy issues in order to implement test cases. * Automatic documentation about driver functionality can be generated by the testing process. Areas of missing functionality can be highlighted to encourage enhancements. * Improve the testing of DBI subclassing, DBI::PurePerl and the various "plumbing" drivers, such as DBD::Proxy and DBD::Multiplex, by automatically running the test suite through them. =head2 Performance The DBI has always treated performance as a priority. However some parts of the implementation remain unoptimized, especially in relation to threads. * The mechanism by which drivers access the core "DBI State" structure (DBIS) is very inefficient when perl is built to support threads (incl mod_perl 2). * The PERL_NO_GET_CONTEXT mechanism is not used by the DBI or drivers so their use of Perl API functions is significantly less efficient. * The majority of the handle creation code, including TIEHASH, is implemented in Perl. Moving most of this to C will speed up handle creation significantly. * The popular fetchrow_hashref() method is many times slower than fetchrow_arrayref() because it has to re-get the names of the fields each time. A $h->{FetchHashReuse} attribute would allow the same hash to be reused each time making fetchrow_hashref() about the same speed as fetchrow_arrayref(). =head2 Introspection * The methods of the DBI API are installed dynamically when the DBI is loaded. The data structure used to define the methods and their dispatch behavior should be made part of the DBI API. This would enable more flexible and correct behavior by both modules subclassing the DBI and especially dynamic drivers such as DBD::Proxy and DBD::Multiplex. * All the handle attributes and related 'metadata' should also be made available for the same reasons. It's common for DBD::Proxy, for example, to not treat new attributes correctly because it's not been taught about them. * Currently is it not possible to discover all the child statement handles that belong to a database handle (or all database handles that belong to a driver handle). This makes certain tasks more difficult, especially some debugging scenarios. A cache of weak-references to child handles would solve the problem without creating reference loops. * A DBI handle is a reference to a tied hash and so has an 'outer' hash that the handle reference points to and an 'inner' hash holding the DBI data. By allowing the inner handle to be changed, for example swapped with a different handle, many new behaviors become possible. For example a database handle to a database that's crashed could have it's inner handle changed to a new connection to a replica. * It is often useful to know what handle attributes have been changed since the handle was created (e.g., in mod_perl where a handle needs to be reset or cloned). This will become more important as developers start exploring the ability to change the inner handle. =head2 High Availability and Load Balancing * The DBD::Multiplex driver is intended to enable a wide range of dynamic functionality including support for various high-availability and load-balancing scenarios. The old version has been used successfully but was limited. It's being rewritten to greatly increase its flexibility and has great potential, but development has stalled. * The DBD::Proxy module is complex and relatively inefficient because it's trying to be a complete proxy for most DBI method calls at both the database handle and statement handle levels. For many applications a simpler proxy architecture that operates with a single round-trip to the server would be sufficient (result rows of SELECT statements would be serialized into the response). Apart from efficiency gains that would also enable the use of stateless servers which then enables the use of a pool of servers for high-availability and load balancing. I envisage a driver base class that implements everything except the 'transport' mechanism and then multiple drivers using the base class with specific transports. For example, one such transport could be the Spread::Queue module. =head2 Extensibility The DBI can be extended in three main dimensions: subclassing the DBI, subclassing a driver, and callback hooks. Each has different pros and cons and each is most applicable in different situations. * Subclassing the DBI is functional but not well defined and some key elements are incomplete, particularly the DbTypeSubclass mechanism (that automatically subclasses to a class tree corresponding to the type of database being used). It also needs more thorough testing. * Subclassing a driver is undocumented, poorly tested and very probably incomplete. However it's a powerful way to embed certain kinds of functionality 'below' applications while avoiding some of the side-effects of subclasing the DBI (especially in relation to error handling). * Callbacks are currently limited to error handling (the HandleError and HandleSetError attributes). Providing callback hooks for more events, such as a row being fetched, would enable utility modules, for example, to add functionality independent of any subclassing in use. =head2 Database Portability * The DBI has not yet addressed the issue of portability among SQL dialects. This is the main hurdle in the way of database portability for the DBI. The goal is not to fully parse the SQL and rewrite it in a different dialect. That's well beyond the scope of the DBI and should be left to layered modules. However, a simple token rewriting mechanism for five comment styles, two quoting styles, four placeholder styles, plus the ODBC "{foo ...}" escape syntax is sufficient to significantly raise the level of SQL portability. * Another major problem area is date/time formatting. Since version 1.41 the DBI has defined a way to express that dates should be fetched in SQL standard date format (YYYY-MM-DD). However it requires the bind_col() method to be called on applicable columns. This is one example of the more general case where bind_col() needs to be called with particular attributes on all columns of a particular type. A mechanism is needed whereby an application can specify default bind_col() attributes for each column type. So with a single step all DATE type columns, for example, can be set to be returned in the standard format. =head2 Debug-ability * Reduce the "noise" when the trace level is set high by moving more trace output to be enabled by the new named-topic trace mechanism. * Calls to XS functions (such as many DBI and driver methods) don't normally appear in the call stack. Optionally enabling that would enable more useful diagnostics to be produced. * Integration with the Perl debugger would make it simpler to perform actions on a per-handle basis (such as breakpoint on execute, breakpoint on error). =head2 Other Enhancements * Support non-blocking mode for drivers that can enable it in their client API. * Scroll-cursor support =head2 Parrot and Perl 6 The current DBI implementation in C code is very unlikely to run on Perl 6. Perl 6 will target the Parrot virtual machine and so the internal architecture will be radically different from Perl 5. The most natural language to implement Perl 6 extensions will be Parrot Intermediate Representation (PIR). Since Parrot includes a Native Call Interface, extensions implemented in PIR should not need a compiler in order to interface to database client API shared libraries. It is a goal of the Parrot project to be a suitable target for many dynamic languages (including Python, PHP, Ruby, etc) and to enable those languages to reuses each others modules. So a database interface for Parrot is also a database interface for all those languages. The Perl DBI is more mature and featureful than the database interfaces of the other languages and so would make an excellent base for the Parrot Database interface. My plan is to better define the API between the DBI and the drivers and use that API as the primary API for the 'raw' Parrot database interface. This project is known a Parrot DBDI. Here's my announcement: http://groups.google.com/[EMAIL PROTECTED] (The project stalled, due to Parrot not having key functionality at the time, and has yet to be restarted.) The bulk of the DBI code actually exists in base classes 'behind' the driver API. The method dispatcher code that Perl applications interface with is relatively small. Each language targeting Parrot would implement their own small language-specific dispatcher over the Parrot DBDI interface. A "big win" here is that a much wider community of developers share the same database drivers and so the benefits of the Open Source model are magnified. The bulk of the work will be translating the C and Perl base class code into Parrot PIR or a suitable language that generates PIR. =head1 PRIORITIES The foundations of many of the changes described above require changes to the interface between the DBI and drivers. To clearly define the transition point the source code will be forked into a DBI v1 branch and the mainline bumped to v2. DBI v1 will continue to be maintained for bug fixes and any enhancements that ease the transition to DBI v2. =head2 Transition Drivers The first priority is to make all the infrastructure changes that impact drivers and make an alpha release available that driver authors can target. As far as possible the changes will be implemented in a way that enables driver authors use the same code base for DBI v1 and DBI v2. The main changes required by driver authors are: * Code changes for PERL_NO_GET_CONTEXT, plus removing PERL_POLLUTE and DBIS * Code changes in DBI/DBD interface (new way to create handles, new callbacks etc) * Common test suite infrastructure (driver-specific test base class) =head2 Transition Applications At the same time a small set of incompatible changes that may impact some applications will also be made. See http://svn.perl.org/modules/dbi/trunk/ToDo (login guest/guest). =head2 Incremental Developments Once DBI v2.0 is available the other enhancements can be implemented incrementally on the updated foundations. The priorities of those changes can be set in the light of then present circumstances. =cut