Author: timbo Date: Mon Aug 9 15:22:59 2004 New Revision: 427 Added: dbi/trunk/Roadmap Log: Add DBI Roadmap document (newly written, draft)
Added: dbi/trunk/Roadmap ============================================================================== --- (empty file) +++ dbi/trunk/Roadmap Mon Aug 9 15:22:59 2004 @@ -0,0 +1,304 @@ +=head1 DBI ROAD-MAP + +9th August 2004 + +This document aims to provide a high level overview of the future direction of the DBI. + +It outlines the broad categories of changes, along with some rational, +but does not go into implementation details and ignores many more +minor planned enhancements. More details can be found in: + + http://svn.perl.org/modules/dbi/trunk/ToDo + +(username guest, password guest) + + +=head2 Unicode + +Use of Unicode with the DBI is growing rapidly. The DBI could do more +to help drivers support Unicode and help applications work with drivers +that don't yet support Unicode directly. + +* Define expected behavior for fetching data and binding parameters. + +* Fix 'leaking' of UTF8 flag from one row to the next. + +* Provide interfaces to support Unicode issues for XS and pure Perl drivers +and applications. + + +=head2 Testing + +The DBI has a test suite. Every driver has a test suite. Each is +limited in its scope. The driver test suite is testing for behavior +that matches what the driver author thinks the DBI specifies but +may be subtly incorrect. These test suites are poorly maintained +because the development cost is relatively high compared to the +"return" from a single driver. + +A common test suite that can be reused by all the drivers is needed. +The benefits include: + +* Ensuring all drivers conform to the DBI specification. +Easing the porting applications between databases and the +implementation of database independent reusable code modules +layered over the DBI. + +* Improving the coverage of the DBI and driver code tested by the +test suite. Driver authors and others will be more motivated to +contribute to the common test suite as the gains are multiplied by +the number of drivers in use. + +* Improving the DBI specification by prompting the clarification +of fuzzy issues in order to implement test cases. + +* Automatic documentation about driver functionality can be generated +by the testing process. Areas of missing functionality can be +highlighted to encourage enhancements. + +* Improve the testing of DBI subclassing, DBI::PurePerl and the +various "plumbing" drivers, such as DBD::Proxy and DBD::Multiplex, +by automatically running the test suite through them. + + +=head2 Performance + +The DBI has always treated performance as a priority. However some +parts of the implementation remain unoptimized, especially in +relation to threads. + +* The mechanism by which drivers access the core "DBI State" structure +(DBIS) is very inefficient when perl is built to support threads +(incl mod_perl 2). + +* The PERL_NO_GET_CONTEXT mechanism is not used by the DBI or drivers +so their use of Perl API functions is significantly less efficient. + +* The majority of the handle creation code, including TIEHASH, is +implemented in Perl. Moving most of this to C will speed up handle +creation significantly. + +* The popular fetchrow_hashref() method is many times slower than +fetchrow_arrayref() because it has to re-get the names of the fields +each time. A $h->{FetchHashReuse} attribute would allow the same +hash to be reused each time making fetchrow_hashref() about the +same speed as fetchrow_arrayref(). + + +=head2 Introspection + +* The methods of the DBI API are installed dynamically when the DBI +is loaded. The data structure used to define the methods and their +dispatch behavior should be made part of the DBI API. This would +enable more flexible and correct behavior by both modules subclassing +the DBI and especially dynamic drivers such as DBD::Proxy and +DBD::Multiplex. + +* All the handle attributes and related 'metadata' should also be +made available for the same reasons. It's common for DBD::Proxy, +for example, to not treat new attributes correctly because it's not +been taught about them. + +* Currently is it not possible to discover all the child statement +handles that belong to a database handle (or all database handles +that belong to a driver handle). This makes certain tasks more +difficult, especially some debugging scenarios. A cache of +weak-references to child handles would solve the problem without +creating reference loops. + +* A DBI handle is a reference to a tied hash and so has an 'outer' +hash that the handle reference points to and an 'inner' hash holding +the DBI data. By allowing the inner handle to be changed, for +example swapped with a different handle, many new behaviors become +possible. For example a database handle to a database that's crashed +could have it's inner handle changed to a new connection to a replica. + +* It is often useful to know what handle attributes have been changed +since the handle was created (e.g., in mod_perl where a handle needs +to be reset or cloned). This will become more important as developers +start exploring the ability to change the inner handle. + + +=head2 High Availability and Load Balancing + +* The DBD::Multiplex driver is intended to enable a wide range of +dynamic functionality including support for various high-availability +and load-balancing scenarios. The old version has been used +successfully but was limited. It's being rewritten to greatly +increase its flexibility and has great potential, but development +has stalled. + +* The DBD::Proxy module is complex and relatively inefficient because +it's trying to be a complete proxy for most DBI method calls at +both the database handle and statement handle levels. For many +applications a simpler proxy architecture that operates with a +single round-trip to the server would be sufficient (result rows +of SELECT statements would be serialized into the response). + +Apart from efficiency gains that would also enable the use of +stateless servers which then enables the use of a pool of servers +for high-availability and load balancing. + +I envisage a driver base class that implements everything except +the 'transport' mechanism and then multiple drivers using the base +class with specific transports. For example, one such transport +could be the Spread::Queue module. + + +=head2 Extensibility + +The DBI can be extended in three main dimensions: subclassing the +DBI, subclassing a driver, and callback hooks. Each has different +pros and cons and each is most applicable in different situations. + +* Subclassing the DBI is functional but not well defined and some +key elements are incomplete, particularly the DbTypeSubclass mechanism +(that automatically subclasses to a class tree corresponding to the +type of database being used). It also needs more thorough testing. + +* Subclassing a driver is undocumented, poorly tested and very +probably incomplete. However it's a powerful way to embed certain +kinds of functionality 'below' applications while avoiding some of +the side-effects of subclasing the DBI (especially in relation to +error handling). + +* Callbacks are currently limited to error handling (the HandleError +and HandleSetError attributes). Providing callback hooks for more +events, such as a row being fetched, would enable utility modules, +for example, to add functionality independent of any subclassing +in use. + + +=head2 Database Portability + +* The DBI has not yet addressed the issue of portability among SQL +dialects. This is the main hurdle in the way of database portability +for the DBI. + +The goal is not to fully parse the SQL and rewrite it in a different +dialect. That's well beyond the scope of the DBI and should be +left to layered modules. However, a simple token rewriting mechanism +for five comment styles, two quoting styles, four placeholder styles, +plus the ODBC "{foo ...}" escape syntax is sufficient to significantly +raise the level of SQL portability. + +* Another major problem area is date/time formatting. Since version 1.41 +the DBI has defined a way to express that dates should be fetched +in SQL standard date format (YYYY-MM-DD). However it requires the +bind_col() method to be called on applicable columns. This is one +example of the more general case where bind_col() needs to be called +with particular attributes on all columns of a particular type. + +A mechanism is needed whereby an application can specify default bind_col() +attributes for each column type. So with a single step all DATE type +columns, for example, can be set to be returned in the standard format. + + +=head2 Debug-ability + +* Reduce the "noise" when the trace level is set high by moving more trace +output to be enabled by the new named-topic trace mechanism. + +* Calls to XS functions (such as many DBI and driver methods) don't +normally appear in the call stack. Optionally enabling that would +enable more useful diagnostics to be produced. + +* Integration with the Perl debugger would make it simpler to perform +actions on a per-handle basis (such as breakpoint on execute, +breakpoint on error). + + +=head2 Other Enhancements + +* Support non-blocking mode for drivers that can enable it in their +client API. + +* Scroll-cursor support + + +=head2 Parrot and Perl 6 + +The current DBI implementation in C code is very unlikely to run +on Perl 6. Perl 6 will target the Parrot virtual machine and so +the internal architecture will be radically different from Perl 5. + +The most natural language to implement Perl 6 extensions will be +Parrot Intermediate Representation (PIR). Since Parrot includes a +Native Call Interface, extensions implemented in PIR should not +need a compiler in order to interface to database client API shared +libraries. + +It is a goal of the Parrot project to be a suitable target for many +dynamic languages (including Python, PHP, Ruby, etc) and to enable +those languages to reuses each others modules. So a database interface +for Parrot is also a database interface for all those languages. + +The Perl DBI is more mature and featureful than the database +interfaces of the other languages and so would make an excellent +base for the Parrot Database interface. + +My plan is to better define the API between the DBI and the drivers and +use that API as the primary API for the 'raw' Parrot database interface. +This project is known a Parrot DBDI. Here's my announcement: + + http://groups.google.com/[EMAIL PROTECTED] + +(The project stalled, due to Parrot not having key functionality +at the time, and has yet to be restarted.) + +The bulk of the DBI code actually exists in base classes 'behind' +the driver API. The method dispatcher code that Perl applications +interface with is relatively small. + +Each language targeting Parrot would implement their own small +language-specific dispatcher over the Parrot DBDI interface. + +A "big win" here is that a much wider community of developers share +the same database drivers and so the benefits of the Open Source +model are magnified. + +The bulk of the work will be translating the C and Perl base class +code into Parrot PIR or a suitable language that generates PIR. + + +=head1 PRIORITIES + +The foundations of many of the changes described above require +changes to the interface between the DBI and drivers. To clearly +define the transition point the source code will be forked into a +DBI v1 branch and the mainline bumped to v2. + +DBI v1 will continue to be maintained for bug fixes and any +enhancements that ease the transition to DBI v2. + +=head2 Transition Drivers + +The first priority is to make all the infrastructure changes that +impact drivers and make an alpha release available that driver +authors can target. As far as possible the changes will be implemented +in a way that enables driver authors use the same code base for DBI +v1 and DBI v2. + +The main changes required by driver authors are: + +* Code changes for PERL_NO_GET_CONTEXT, plus removing PERL_POLLUTE +and DBIS + +* Code changes in DBI/DBD interface (new way to create handles, new +callbacks etc) + +* Common test suite infrastructure (driver-specific test base class) + +=head2 Transition Applications + +At the same time a small set of incompatible changes that may impact +some applications will also be made. See +http://svn.perl.org/modules/dbi/trunk/ToDo (login guest/guest). + +=head2 Incremental Developments + +Once DBI v2.0 is available the other enhancements can be implemented +incrementally on the updated foundations. The priorities of those +changes can be set in the light of then present circumstances. + +=cut
