Re: Adding utf8 support to DBD::mysql

Charles Jardine Mon, 01 May 2006 03:21:50 -0700

Tim Bunce wrote:

[I'm at the mysql conference and Patrick asked me about adding utf8
support to DBD::mysql.]


 [snip]

*** Detecting Inconsistencies

If the connection character set is _not_ utf8 but the application calls
the driver with data (or SQL statement) that has the UTF8 flag set, then
it could issue a warning. In practice that may be to be too noisy for
people that done their own workarounds for utf8 support. If so then
they could be changes to level 1 trace messages.

If the connection character set _is_ utf8, and the application calls
the driver with data (or SQL statement) that does _not_ have the UTF8
flag set but _does_ have bytes with the high bit set, then the driver
should issue a warning. The checking for high bit set is an extra cost
so this should only be enabled if tracing and/or an attribute is set
(perhaps called $dbh->{mysql_charset_checks} = 1)


Tim,

You don't explicitly say what you are proposing should be done with
the anomalous data. I guess, by analogy with the behaviour of
DBD::Oracle's handling of SQL statements, that the implicit proposal
is to pass the octets of the anomalous string unchanged across the
connection. This will result in octet strings which perl has flagged
as being utf8-encoded being passed over connections which expect
byte encoding, and vice versa.

I think that this is wrong as the default behaviour for a DBD, and
I would be sorry to see another DBD converted to behave in this way.

The default behaviour I would like to see is as follows:

If a utf8-flagged string is presented for transmission over a
byte-encoded connection, an attempt should be made to downgrade
the string to byte encoding. This will fail if the string contains
characters with codepoints > 255. Such failure should be treated
as an error.

If a string without the utf8 flag is presented for transmission
across a utf8-encoded connection, it should simply be upgraded
to utf8 encoding. This cannot fail.

I am aware that a DBD which does not automatically upgrade and
downgrade may provide a useful compatibility bridge for programs
originally written to cope with DBDs without Unicode support.
However, such DBDs are not compatible with the spirit of
perldoc perluniintro and perldoc perlunicode. To quote from the
former:

     o   How Do I Know Whether My String Is In Unicode?

         You shouldn't care.  No, you really shouldn't.  No,
         really.  If you have to care--beyond the cases described
         above--it means that we didn't get the transparency of
         Unicode quite right.

         Okay, if you insist: [...]


A DBD which does not handle upgrading and downgrading itself
doesn't get the transparency quite right. The writer of a program
using such a driver has to care about the utf8 flag, since
strings which compare equal in perl, but differ in the setting
of the flag, will produce different results when processed by
the DBD. This ought not to be the way of the future.

--
Charles Jardine - Computing Service, University of Cambridge
[EMAIL PROTECTED]    Tel: +44 1223 334506, Fax: +44 1223 334679

Re: Adding utf8 support to DBD::mysql

Reply via email to