Author: timbo
Date: Tue Nov 9 09:05:45 2004
New Revision: 570
Modified:
dbi/trunk/Changes
dbi/trunk/DBI.pm
dbi/trunk/Roadmap.pod
Log:
Added data_string_diff() data_string_desc() and data_diff()
Reorg Roadmap
Modified: dbi/trunk/Changes
==============================================================================
--- dbi/trunk/Changes (original)
+++ dbi/trunk/Changes Tue Nov 9 09:05:45 2004
@@ -11,6 +11,10 @@
Fixed test.pl Win32 undef warning thanks to H.Merijn Brand & David Repko.
Updated Roadmap and ToDo
+ Added data_string_diff() data_string_desc() and data_diff()
+ utility functions to help diagnose Unicode issues.
+XXX needs docs
+
=head2 Changes in DBI 1.45 (svn rev 480), 6th October 2004
Fixed DBI::DBD code for drivers broken in 1.44.
Modified: dbi/trunk/DBI.pm
==============================================================================
--- dbi/trunk/DBI.pm (original)
+++ dbi/trunk/DBI.pm Tue Nov 9 09:05:45 2004
@@ -981,6 +981,7 @@
return @ds;
}
+
sub neat_list {
my ($listref, $maxlen, $sep) = @_;
$maxlen = 0 unless defined $maxlen; # 0 == use internal default
@@ -1007,6 +1008,75 @@
}
+sub data_diff {
+ my ($a, $b) = @_;
+ require utf8;
+
+ # hacks to cater for perl 5.6 for data_string_diff() & data_string_desc()
+ *utf8::is_utf8 = sub {
+ return (DBI::neat(shift) =~ /^"/); # XXX ugly hack, sufficient here
+ } unless defined &utf8::is_utf8;
+ *utf8::valid = sub { 1 } unless defined &utf8::valid;
+
+ my $a_desc = data_string_desc($a);
+ my $b_desc = data_string_desc($b);
+ my $diff = data_string_diff($a, $b);
+
+ return "" if !$diff && $a_desc eq $b_desc;
+
+ return "\$a: $a_desc\n\$b: $b_desc\n$diff";
+}
+
+
+sub data_string_diff
+ # Compares 'logical' characters, not bytes, so a latin1 string and an
+ # an equivalent unicode string will compare as equal even though their
+ # byte encodings are different.
+ my ($a, $b) = @_;
+ my @a_chars = (utf8::is_utf8($a)) ? unpack("U*", $a) : unpack("C*", $a);
+ my @b_chars = (utf8::is_utf8($b)) ? unpack("U*", $b) : unpack("C*", $b);
+ my $i = 0;
+ while (@a_chars && @b_chars) {
+ ++$i, shift(@a_chars), shift(@b_chars), next
+ if $a_chars[0] == $b_chars[0];# compare ordinal values
+ my @desc = map {
+ $_ > 255 ? # if wide character...
+ sprintf("\\x{%04X}", $_) : # \x{...}
+ chr($_) =~ /[[:cntrl:]]/ ? # else if control character ...
+ sprintf("\\x%02X", $_) : # \x..
+ chr($_) # else as themselves
+ } ($a_chars[0], $b_chars[0]);
+ # highlight probable double-encoding?
+ foreach my $c ( @desc ) {
+ next unless $c =~ m/\\x\{08(..)}/;
+ $c .= "='" .chr(hex($1)) ."'"
+ }
+ return sprintf "Strings differ at index $i: a[$i]=$desc[0],
b[$i]=$desc[1]\n";
+ }
+ return "String a truncated after $i characters\n" if @b_chars;
+ return "String b truncated after $i characters\n" if @a_chars;
+ return "";
+}
+
+sub data_string_desc { # describe a data string
+ my ($a) = @_;
+ require utf8;
+ require bytes;
+ # Give sufficient info to help diagnose at least these kinds of situations:
+ # - valid UTF8 byte sequence but UTF8 flag not set
+ # (might be ascii so also need to check for hibit to make it worthwhile)
+ # - UTF8 flag set but invalid UTF8 byte sequence
+ # could do better here, but this'll do for now
+ my $is_ascii = $a =~ m/^[\000-\177]*$/;
+ return sprintf "UTF8 %s%s, %s, %d characters %d bytes%s",
+ utf8::is_utf8($a) ? "on" : "off",
+ utf8::valid($a) ? "" : " but INVALID encoding",
+ $is_ascii ? "ASCII" : "Non-ASCII",
+ length($a), bytes::length($a);
+}
+
+#BEGIN { die data_diff("foox", "foo\x{083a}bar")}
+
sub connect_test_perf {
my($class, $dsn,$dbuser,$dbpass, $attr) = @_;
Modified: dbi/trunk/Roadmap.pod
==============================================================================
--- dbi/trunk/Roadmap.pod (original)
+++ dbi/trunk/Roadmap.pod Tue Nov 9 09:05:45 2004
@@ -38,36 +38,41 @@
=head1 CHANGES AND ENHANCEMENTS
-=head2 Batch Statements
-
-Batch statements are a sequence of SQL statements, or a stored procedure
-containing a sequence of SQL statements, which can be executed as a whole.
+These are grouped into categories and are not listed in any particular order.
-Currently the DBI has no standard interface for dealing with multiple
-results from batch statements. After considerable discussion, an
-interface design has been agreed upon with driver authors, but has
-not yet been implemented.
+=head2 Performance
-These changes would enable greater application portability between
-databases, and greater performance for databases that directly
-support batch statements.
+The DBI has always treated performance as a priority. Some parts of the
+implementation, however, remain unoptimized, especially in relation to threads.
-=head2 Unicode
+* When the DBI is used with a Perl built with thread support enabled
+(such as for Apache mod_perl 2, and some common Linux distributions)
+it runs significantly slower. There are two reasons for this and both
+can be fixed but require non-trivial changes to both the DBI and drivers.
-Use of Unicode with the DBI is growing rapidly. The DBI should do more
-to help drivers support Unicode and help applications work with drivers
-that don't yet support Unicode directly.
+* Connection pooling in a threaded application, such as mod_perl, is
+difficult because DBI handles cannot be passed between threads.
+An alternative mechanism for passing connections between threads
+has been defined, and an experimental connection pool module
+implemented using it, but development has stalled.
-* Define expected behavior for fetching data and binding parameters.
+* The majority of DBI handle creation code is implemented in Perl.
+Moving most of this to C will speed up handle creation significantly.
-* Provide interfaces to support Unicode issues for XS and pure Perl drivers
-and applications.
+* The popular fetchrow_hashref() method is many times slower than
+fetchrow_arrayref(). It has to get the names of the columns, then create and
+load a new hash each time. A $h->{FetchHashReuse} attribute would allow the
+same hash to be reused each time making fetchrow_hashref() about the same speed
+as fetchrow_arrayref().
-* Provide functions for applications to help diagnose inconsistencies
-between byte string contents and setting of the SvUTF8 flag.
+* Support for asynchronous (non-blocking) DBI method calls would enable
+applications to continue processing in parallel with database activity.
+This is also relevant for GUI and other event-driven applications.
+The DBI needs to define a standard interface for this so drivers can
+implement it in a portable way, where possible.
-These changes would smooth the transition to Unicode for many
-applications and drivers.
+These changes would significantly enhance the performance of the
+DBI and many applications which use the DBI.
=head2 Testing
@@ -102,39 +107,64 @@
These changes would improve the quality of all applications using the DBI.
-=head2 Performance
+=head2 High Availability and Load Balancing
-The DBI has always treated performance as a priority. Some parts of the
-implementation, however, remain unoptimized, especially in relation to threads.
+* The DBD::Multiplex driver provides a framework to enable a wide range of
+dynamic functionality, including support for high-availability, load-balancing,
+caching, and access to distributed data. It is currently being rewritten to
+greatly increase its flexibility and has potential to be a very powerful tool,
+but development has stalled.
-* When the DBI is used with a Perl built with thread support enabled
-(such as for Apache mod_perl 2, and some common Linux distributions)
-it runs significantly slower. There are two reasons for this and both
-can be fixed but require non-trivial changes to both the DBI and drivers.
+* The DBD::Proxy module is complex and relatively inefficient because
+it's trying to be a complete proxy for most DBI method calls. For many
+applications a simpler proxy architecture that operates with a single
+round-trip to the server would be sufficient and preferable.
-* Connection pooling in a threaded application, such as mod_perl, is
-difficult because DBI handles cannot be passed between threads.
-An alternative mechanism for passing connections between threads
-has been defined, and an experimental connection pool module
-implemented using it, but development has stalled.
+New proxy client and server classes are needed, which could be
+subclassed to support specific client to server transport mechanisms
+(such as HTTP and Spread::Queue). Apart from the efficiency gains,
+this would also enable the use of a load-balanced pool of stateless
+servers.
-* The majority of DBI handle creation code is implemented in Perl.
-Moving most of this to C will speed up handle creation significantly.
+* The DBI currently offers no support for distributed transactions.
+The most useful elements of the standard XA distributed transaction interface
+standard could be included in the DBI specification. Drivers for databases
+which support distributed transactions could then be extended to support it.
-* The popular fetchrow_hashref() method is many times slower than
-fetchrow_arrayref(). It has to get the names of the columns, then create and
-load a new hash each time. A $h->{FetchHashReuse} attribute would allow the
-same hash to be reused each time making fetchrow_hashref() about the same speed
-as fetchrow_arrayref().
+These changes would enable new kinds of DBI applications for critical
environments.
-* Support for asynchronous (non-blocking) DBI method calls would enable
-applications to continue processing in parallel with database activity.
-This is also relevant for GUI and other event-driven applications.
-The DBI needs to define a standard interface for this so drivers can
-implement it in a portable way, where possible.
-These changes would significantly enhance the performance of the
-DBI and many applications which use the DBI.
+=head2 Unicode
+
+Use of Unicode with the DBI is growing rapidly. The DBI should do more
+to help drivers support Unicode and help applications work with drivers
+that don't yet support Unicode directly.
+
+* Define expected behavior for fetching data and binding parameters.
+
+* Provide interfaces to support Unicode issues for XS and pure Perl drivers
+and applications.
+
+* Provide functions for applications to help diagnose inconsistencies
+between byte string contents and setting of the SvUTF8 flag.
+
+These changes would smooth the transition to Unicode for many
+applications and drivers.
+
+
+=head2 Batch Statements
+
+Batch statements are a sequence of SQL statements, or a stored procedure
+containing a sequence of SQL statements, which can be executed as a whole.
+
+Currently the DBI has no standard interface for dealing with multiple
+results from batch statements. After considerable discussion, an
+interface design has been agreed upon with driver authors, but has
+not yet been implemented.
+
+These changes would enable greater application portability between
+databases, and greater performance for databases that directly
+support batch statements.
=head2 Introspection
@@ -164,33 +194,6 @@
advanced uses of the DBI.
-=head2 High Availability and Load Balancing
-
-* The DBD::Multiplex driver provides a framework to enable a wide range of
-dynamic functionality, including support for high-availability, load-balancing,
-caching, and access to distributed data. It is currently being rewritten to
-greatly increase its flexibility and has potential to be a very powerful tool,
-but development has stalled.
-
-* The DBD::Proxy module is complex and relatively inefficient because
-it's trying to be a complete proxy for most DBI method calls. For many
-applications a simpler proxy architecture that operates with a single
-round-trip to the server would be sufficient and preferable.
-
-New proxy client and server classes are needed, which could be
-subclassed to support specific client to server transport mechanisms
-(such as HTTP and Spread::Queue). Apart from the efficiency gains,
-this would also enable the use of a load-balanced pool of stateless
-servers.
-
-* The DBI currently offers no support for distributed transactions.
-The most useful elements of the standard XA distributed transaction interface
-standard could be included in the DBI specification. Drivers for databases
-which support distributed transactions could then be extended to support it.
-
-These changes would enable new kinds of DBI applications.
-
-
=head2 Extensibility
The DBI can be extended in three main dimensions: subclassing the
@@ -217,6 +220,25 @@
applications, layered modules, and the DBI.
+=head2 Debugability
+
+* Enabling DBI trace output at a high level of detail causes a large volume of
+output, much of it unrelated to the problem being investigated. More trace
+output should be controlled by the new named-topic mechanism instead of the
+trace level.
+
+* Calls to XS functions (such as many DBI and driver methods) don't
+normally appear in the call stack. Optionally enabling that would
+enable more useful diagnostics to be produced.
+
+* Integration with the Perl debugger would make it simpler to perform
+actions on a per-handle basis (such as breakpoint on execute,
+breakpoint on error).
+
+These changes would enable more rapid application development and
+fault finding.
+
+
=head2 Database Portability
* The DBI has not yet addressed the issue of portability among SQL
@@ -245,33 +267,17 @@
and greater functionality for layered modules.
-=head2 Debugability
-
-* Enabling DBI trace output at a high level of detail causes a large volume of
-output, much of it unrelated to the problem being investigated. More trace
-output should be controlled by the new named-topic mechanism instead of the
-trace level.
+=head2 Intellectual Property
-* Calls to XS functions (such as many DBI and driver methods) don't
-normally appear in the call stack. Optionally enabling that would
-enable more useful diagnostics to be produced.
+* Clarify current intellectual property status, including a review
+ of past contributions to ensure the DBI is unemcumbered.
-* Integration with the Perl debugger would make it simpler to perform
-actions on a per-handle basis (such as breakpoint on execute,
-breakpoint on error).
-
-These changes would enable more rapid application development and
-fault finding.
+* Establish a procedure for vetting future contributions for any
+ intellectual property issues.
=head2 Other Enhancements
-* Clarify current intellectual property status, including a review
- of past contributions.
-
-* Establishing a procedure for vetting future contributions for any
- intellectual property issues.
-
* Reduce the work needed to create new database interface drivers.
* Definition of an interface to support scrollable cursors.
@@ -279,7 +285,7 @@
=head2 Parrot and Perl 6
-The current DBI implementation in C code is very unlikely to run on Perl 6.
+The current DBI implementation in C code is unlikely to run on Perl 6.
Perl 6 will target the Parrot virtual machine and so the internal architecture
will be radically different from Perl 5.
@@ -350,12 +356,14 @@
Once DBI v2.0 is available, the other enhancements can be implemented
incrementally on the updated foundations. Priorities for those
-changes have not yet been set.
+changes have not yet been set. If your company would benefit from
+a specific feature it could pay to sponsor early development of it.
+
=head1 RESOURCES AND CONTRIBUTIONS
This roadmap does not address the resources required to implement
-in a timely manner the changes for DBI v2.0 and beyond.
+the changes for DBI v2.0 and beyond.
See L<http://dbi.perl.org/contributing> for I<how you can help>.