Author: timbo
Date: Tue Nov  9 09:05:45 2004
New Revision: 570

Modified:
   dbi/trunk/Changes
   dbi/trunk/DBI.pm
   dbi/trunk/Roadmap.pod
Log:
Added data_string_diff() data_string_desc() and data_diff()
Reorg Roadmap


Modified: dbi/trunk/Changes
==============================================================================
--- dbi/trunk/Changes   (original)
+++ dbi/trunk/Changes   Tue Nov  9 09:05:45 2004
@@ -11,6 +11,10 @@
   Fixed test.pl Win32 undef warning thanks to H.Merijn Brand & David Repko.
   Updated Roadmap and ToDo
 
+  Added data_string_diff() data_string_desc() and data_diff()
+    utility functions to help diagnose Unicode issues.
+XXX needs docs
+
 =head2 Changes in DBI 1.45 (svn rev 480),    6th October 2004
 
   Fixed DBI::DBD code for drivers broken in 1.44.

Modified: dbi/trunk/DBI.pm
==============================================================================
--- dbi/trunk/DBI.pm    (original)
+++ dbi/trunk/DBI.pm    Tue Nov  9 09:05:45 2004
@@ -981,6 +981,7 @@
     return @ds;
 }
 
+
 sub neat_list {
     my ($listref, $maxlen, $sep) = @_;
     $maxlen = 0 unless defined $maxlen;        # 0 == use internal default
@@ -1007,6 +1008,75 @@
 }
 
 
+sub data_diff {
+    my ($a, $b) = @_;
+    require utf8;
+
+    # hacks to cater for perl 5.6 for data_string_diff() & data_string_desc()
+    *utf8::is_utf8 = sub {
+        return (DBI::neat(shift) =~ /^"/); # XXX ugly hack, sufficient here
+    } unless defined &utf8::is_utf8;
+    *utf8::valid = sub { 1 } unless defined &utf8::valid;
+
+    my $a_desc = data_string_desc($a);
+    my $b_desc = data_string_desc($b);
+    my $diff   = data_string_diff($a, $b);
+
+    return "" if !$diff && $a_desc eq $b_desc;
+
+    return "\$a: $a_desc\n\$b: $b_desc\n$diff";
+}
+    
+
+sub data_string_diff 
+    # Compares 'logical' characters, not bytes, so a latin1 string and an
+    # an equivalent unicode string will compare as equal even though their
+    # byte encodings are different.
+    my ($a, $b) = @_;
+    my @a_chars = (utf8::is_utf8($a)) ? unpack("U*", $a) : unpack("C*", $a);
+    my @b_chars = (utf8::is_utf8($b)) ? unpack("U*", $b) : unpack("C*", $b);
+    my $i = 0;
+    while (@a_chars && @b_chars) {
+       ++$i, shift(@a_chars), shift(@b_chars), next
+           if $a_chars[0] == $b_chars[0];# compare ordinal values
+       my @desc = map {
+           $_ > 255 ?                    # if wide character...
+             sprintf("\\x{%04X}", $_) :  # \x{...}
+             chr($_) =~ /[[:cntrl:]]/ ?  # else if control character ...
+             sprintf("\\x%02X", $_) :    # \x..
+             chr($_)                     # else as themselves
+       } ($a_chars[0], $b_chars[0]);
+       # highlight probable double-encoding?
+        foreach my $c ( @desc ) {
+           next unless $c =~ m/\\x\{08(..)}/;
+           $c .= "='" .chr(hex($1)) ."'"
+       }
+       return sprintf "Strings differ at index $i: a[$i]=$desc[0], 
b[$i]=$desc[1]\n";
+    }
+    return "String a truncated after $i characters\n" if @b_chars;
+    return "String b truncated after $i characters\n" if @a_chars;
+    return "";
+}
+
+sub data_string_desc { # describe a data string
+    my ($a) = @_;
+    require utf8;
+    require bytes;
+    # Give sufficient info to help diagnose at least these kinds of situations:
+    # - valid UTF8 byte sequence but UTF8 flag not set
+    #   (might be ascii so also need to check for hibit to make it worthwhile)
+    # - UTF8 flag set but invalid UTF8 byte sequence
+    # could do better here, but this'll do for now
+    my $is_ascii = $a =~ m/^[\000-\177]*$/;
+    return sprintf "UTF8 %s%s, %s, %d characters %d bytes%s",
+       utf8::is_utf8($a) ? "on" : "off",
+       utf8::valid($a) ? "" : " but INVALID encoding",
+       $is_ascii ? "ASCII" : "Non-ASCII",
+       length($a), bytes::length($a);
+}
+
+#BEGIN { die data_diff("foox", "foo\x{083a}bar")}
+
 
 sub connect_test_perf {
     my($class, $dsn,$dbuser,$dbpass, $attr) = @_;

Modified: dbi/trunk/Roadmap.pod
==============================================================================
--- dbi/trunk/Roadmap.pod       (original)
+++ dbi/trunk/Roadmap.pod       Tue Nov  9 09:05:45 2004
@@ -38,36 +38,41 @@
 
 =head1 CHANGES AND ENHANCEMENTS
 
-=head2 Batch Statements
-
-Batch statements are a sequence of SQL statements, or a stored procedure
-containing a sequence of SQL statements, which can be executed as a whole.
+These are grouped into categories and are not listed in any particular order.
 
-Currently the DBI has no standard interface for dealing with multiple
-results from batch statements.  After considerable discussion, an
-interface design has been agreed upon with driver authors, but has
-not yet been implemented.
+=head2 Performance
 
-These changes would enable greater application portability between
-databases, and greater performance for databases that directly
-support batch statements.
+The DBI has always treated performance as a priority. Some parts of the
+implementation, however, remain unoptimized, especially in relation to threads.
 
-=head2 Unicode
+* When the DBI is used with a Perl built with thread support enabled
+(such as for Apache mod_perl 2, and some common Linux distributions)
+it runs significantly slower. There are two reasons for this and both
+can be fixed but require non-trivial changes to both the DBI and drivers.
 
-Use of Unicode with the DBI is growing rapidly. The DBI should do more
-to help drivers support Unicode and help applications work with drivers
-that don't yet support Unicode directly.
+* Connection pooling in a threaded application, such as mod_perl, is
+difficult because DBI handles cannot be passed between threads.
+An alternative mechanism for passing connections between threads
+has been defined, and an experimental connection pool module
+implemented using it, but development has stalled.
 
-* Define expected behavior for fetching data and binding parameters.
+* The majority of DBI handle creation code is implemented in Perl.
+Moving most of this to C will speed up handle creation significantly.
 
-* Provide interfaces to support Unicode issues for XS and pure Perl drivers
-and applications.
+* The popular fetchrow_hashref() method is many times slower than
+fetchrow_arrayref(). It has to get the names of the columns, then create and
+load a new hash each time. A $h->{FetchHashReuse} attribute would allow the
+same hash to be reused each time making fetchrow_hashref() about the same speed
+as fetchrow_arrayref().
 
-* Provide functions for applications to help diagnose inconsistencies
-between byte string contents and setting of the SvUTF8 flag.
+* Support for asynchronous (non-blocking) DBI method calls would enable
+applications to continue processing in parallel with database activity.
+This is also relevant for GUI and other event-driven applications.
+The DBI needs to define a standard interface for this so drivers can
+implement it in a portable way, where possible.
 
-These changes would smooth the transition to Unicode for many
-applications and drivers.
+These changes would significantly enhance the performance of the
+DBI and many applications which use the DBI.
 
 
 =head2 Testing
@@ -102,39 +107,64 @@
 These changes would improve the quality of all applications using the DBI.
 
 
-=head2 Performance
+=head2 High Availability and Load Balancing
 
-The DBI has always treated performance as a priority. Some parts of the
-implementation, however, remain unoptimized, especially in relation to threads.
+* The DBD::Multiplex driver provides a framework to enable a wide range of
+dynamic functionality, including support for high-availability, load-balancing,
+caching, and access to distributed data.  It is currently being rewritten to
+greatly increase its flexibility and has potential to be a very powerful tool,
+but development has stalled.
 
-* When the DBI is used with a Perl built with thread support enabled
-(such as for Apache mod_perl 2, and some common Linux distributions)
-it runs significantly slower. There are two reasons for this and both
-can be fixed but require non-trivial changes to both the DBI and drivers.
+* The DBD::Proxy module is complex and relatively inefficient because
+it's trying to be a complete proxy for most DBI method calls.  For many
+applications a simpler proxy architecture that operates with a single
+round-trip to the server would be sufficient and preferable.
 
-* Connection pooling in a threaded application, such as mod_perl, is
-difficult because DBI handles cannot be passed between threads.
-An alternative mechanism for passing connections between threads
-has been defined, and an experimental connection pool module
-implemented using it, but development has stalled.
+New proxy client and server classes are needed, which could be
+subclassed to support specific client to server transport mechanisms
+(such as HTTP and Spread::Queue).  Apart from the efficiency gains,
+this would also enable the use of a load-balanced pool of stateless
+servers.
 
-* The majority of DBI handle creation code is implemented in Perl.
-Moving most of this to C will speed up handle creation significantly.
+* The DBI currently offers no support for distributed transactions.
+The most useful elements of the standard XA distributed transaction interface
+standard could be included in the DBI specification.  Drivers for databases
+which support distributed transactions could then be extended to support it.
 
-* The popular fetchrow_hashref() method is many times slower than
-fetchrow_arrayref(). It has to get the names of the columns, then create and
-load a new hash each time. A $h->{FetchHashReuse} attribute would allow the
-same hash to be reused each time making fetchrow_hashref() about the same speed
-as fetchrow_arrayref().
+These changes would enable new kinds of DBI applications for critical 
environments.
 
-* Support for asynchronous (non-blocking) DBI method calls would enable
-applications to continue processing in parallel with database activity.
-This is also relevant for GUI and other event-driven applications.
-The DBI needs to define a standard interface for this so drivers can
-implement it in a portable way, where possible.
 
-These changes would significantly enhance the performance of the
-DBI and many applications which use the DBI.
+=head2 Unicode
+
+Use of Unicode with the DBI is growing rapidly. The DBI should do more
+to help drivers support Unicode and help applications work with drivers
+that don't yet support Unicode directly.
+
+* Define expected behavior for fetching data and binding parameters.
+
+* Provide interfaces to support Unicode issues for XS and pure Perl drivers
+and applications.
+
+* Provide functions for applications to help diagnose inconsistencies
+between byte string contents and setting of the SvUTF8 flag.
+
+These changes would smooth the transition to Unicode for many
+applications and drivers.
+
+
+=head2 Batch Statements
+
+Batch statements are a sequence of SQL statements, or a stored procedure
+containing a sequence of SQL statements, which can be executed as a whole.
+
+Currently the DBI has no standard interface for dealing with multiple
+results from batch statements.  After considerable discussion, an
+interface design has been agreed upon with driver authors, but has
+not yet been implemented.
+
+These changes would enable greater application portability between
+databases, and greater performance for databases that directly
+support batch statements.
 
 
 =head2 Introspection
@@ -164,33 +194,6 @@
 advanced uses of the DBI.
 
 
-=head2 High Availability and Load Balancing
-
-* The DBD::Multiplex driver provides a framework to enable a wide range of
-dynamic functionality, including support for high-availability, load-balancing,
-caching, and access to distributed data.  It is currently being rewritten to
-greatly increase its flexibility and has potential to be a very powerful tool,
-but development has stalled.
-
-* The DBD::Proxy module is complex and relatively inefficient because
-it's trying to be a complete proxy for most DBI method calls.  For many
-applications a simpler proxy architecture that operates with a single
-round-trip to the server would be sufficient and preferable.
-
-New proxy client and server classes are needed, which could be
-subclassed to support specific client to server transport mechanisms
-(such as HTTP and Spread::Queue).  Apart from the efficiency gains,
-this would also enable the use of a load-balanced pool of stateless
-servers.
-
-* The DBI currently offers no support for distributed transactions.
-The most useful elements of the standard XA distributed transaction interface
-standard could be included in the DBI specification.  Drivers for databases
-which support distributed transactions could then be extended to support it.
-
-These changes would enable new kinds of DBI applications.
-
-
 =head2 Extensibility
 
 The DBI can be extended in three main dimensions: subclassing the
@@ -217,6 +220,25 @@
 applications, layered modules, and the DBI.
 
 
+=head2 Debugability
+
+* Enabling DBI trace output at a high level of detail causes a large volume of
+output, much of it unrelated to the problem being investigated. More trace
+output should be controlled by the new named-topic mechanism instead of the
+trace level.
+
+* Calls to XS functions (such as many DBI and driver methods) don't
+normally appear in the call stack.  Optionally enabling that would
+enable more useful diagnostics to be produced.
+
+* Integration with the Perl debugger would make it simpler to perform
+actions on a per-handle basis (such as breakpoint on execute,
+breakpoint on error).
+ 
+These changes would enable more rapid application development and
+fault finding.
+
+
 =head2 Database Portability
 
 * The DBI has not yet addressed the issue of portability among SQL
@@ -245,33 +267,17 @@
 and greater functionality for layered modules.
 
 
-=head2 Debugability
-
-* Enabling DBI trace output at a high level of detail causes a large volume of
-output, much of it unrelated to the problem being investigated. More trace
-output should be controlled by the new named-topic mechanism instead of the
-trace level.
+=head2 Intellectual Property
 
-* Calls to XS functions (such as many DBI and driver methods) don't
-normally appear in the call stack.  Optionally enabling that would
-enable more useful diagnostics to be produced.
+* Clarify current intellectual property status, including a review
+  of past contributions to ensure the DBI is unemcumbered.
 
-* Integration with the Perl debugger would make it simpler to perform
-actions on a per-handle basis (such as breakpoint on execute,
-breakpoint on error).
- 
-These changes would enable more rapid application development and
-fault finding.
+* Establish a procedure for vetting future contributions for any
+  intellectual property issues.
 
 
 =head2 Other Enhancements
 
-* Clarify current intellectual property status, including a review
-  of past contributions.
-
-* Establishing a procedure for vetting future contributions for any
-  intellectual property issues.
-
 * Reduce the work needed to create new database interface drivers.
 
 * Definition of an interface to support scrollable cursors.
@@ -279,7 +285,7 @@
 
 =head2 Parrot and Perl 6
 
-The current DBI implementation in C code is very unlikely to run on Perl 6.
+The current DBI implementation in C code is unlikely to run on Perl 6.
 Perl 6 will target the Parrot virtual machine and so the internal architecture
 will be radically different from Perl 5.
 
@@ -350,12 +356,14 @@
 
 Once DBI v2.0 is available, the other enhancements can be implemented
 incrementally on the updated foundations. Priorities for those
-changes have not yet been set.
+changes have not yet been set. If your company would benefit from
+a specific feature it could pay to sponsor early development of it.
+
 
 =head1 RESOURCES AND CONTRIBUTIONS
 
 This roadmap does not address the resources required to implement
-in a timely manner the changes for DBI v2.0 and beyond.
+the changes for DBI v2.0 and beyond.
 
 See L<http://dbi.perl.org/contributing> for I<how you can help>.
 

Reply via email to