Hi all,

I'd like to issue a word of warning about the MySQL driver. While  
running a few tests with an app that uses libdbi  
(http://refdb.sourceforge.net) I came across a problem which I  
initially thought to be a bug in the MySQL version that I used. Turns  
out that it was a problem with the MySQL driver. Applications that use  
this driver may fail to find particular data when running queries.  
Even worse, if you upgrade to a fixed version of the driver, your data  
may come back corrupted, making you think the fixed driver is broken.  
I've checked in a fix yesterday. I cannot recommend to upgrade to the  
current cvs version unless you understand the nature of the possible  
data corruption.

The problem is known to occur under the following circumstances (other  
scenarios are possible though):

1) Your application set the encoding to "auto" which is supposed to  
make the client library encoding match the encoding of the database  
you're connecting to.

2) Your database uses a multibyte encoding like UTF-8

3) Your data contain multibyte characters like in UTF-8

The bug caused the driver to use the IANA name of the target encoding  
instead of the MySQL-specific one, causing the initial "SET NAMES"  
command to be without effect. As the command seemed so trivial the  
driver did not check for errors. Now with the default client encoding  
in effect, MySQL may attempt to transform the incoming data UTF-8 data  
from "latin1" to "UTF-8", causing e.g. umlauts to take up 4 instead of  
2 bytes. If you later retrieve the data through the same application,  
these characters will correctly be translated back to their 2-byte  
versions, making everything seem alright. However, if you attempt to  
match an umlaut in a SELECT statement, it will always fail because the  
stored version differs from what you're searching for.

Obviously, if you'll switch to a fixed version of the driver, the data  
won't be translated back as the driver now (correctly, but to the  
wrong end) thinks that no conversion is required.

If the application that you use libdbi for allows to retrieve the  
contents of the database one way or another, you should think about  
ways to detect the existence of multibyte characters in your data. If  
there are none, you're pretty much on the safe side. If there are,  
you'd have to find a way to retrieve all data with the old version of  
the driver, and add them again using the new version.

Please feel free to discuss these kinds of upgrade problems right on  
this list, as others may benefit from your thoughts. Please also  
accept my apologies for screwing up this feature of the MySQL driver.

regards,
Markus


-- 
Markus Hoenicka
[EMAIL PROTECTED]
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de


-------------------------------------------------------------------------
SF.Net email is sponsored by: 
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users

Reply via email to