... should just work. It doesn't quite, because the hex string is not just a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte stream written with a 16 Bit Hex Format for each byte. Each and every 16-Bit-Word has its most significant byte set to 0.

If it was a dump of a 16 Bit Unicode encoding, it should read
"0065006d00200064006100730068003a00202014"
and not
"0065006d00200064006100730068003a002000e200800094"

Your call to decode() compensates that, probably because you encoded once too much before writing the data into the database.

I did not encode at all. I simply created a utf8 string in Perl land and inserted it into Sybase.
And Sybase / DBD::Sybase happily ignored Perl's UTF8 flag and stored the byte stream as if it were characters.

Really, DBD::Sybase needs to handle any character set translation, not the end user.

Right. (But remember that DBI was there before Unicode support was added to Perl, and also most DBDs are older that the Unicode support. Before Unicode was there, you just passed bytes around and everything just worked.)

For the same reason, but with a different DBD, I hacked the first Unicode patch for DBD::ODBC, just enough code to have proper Unicode support in bind parameters from Perl to the database, and in returned fetch values. DBD::ODBC has been improved since the patch was merged by Martin J. Evans into v1.14, and it now supports Unicode in many other places, too. The DBI API was also improved during that process, allowing to have Unicode SQL query strings.

That's why I proposed to switch to DBD::ODBC: It is well tested and supports Unicode as good as the ODBC driver does.

The raw patch is still available at <http://www.alexander-foken.de/unicode-patch.txt.gz>. DBD::Sybase obviously lacks such a patch. Sybase may have a Unicode API, but no part of DBD::Sybase uses it (properly). In DBD::ODBC, the patch "just" tests the UTF8 flag for all relevant data coming from Perl and converts UTF8 to the UCS2 encoding required by the ODBC API, and converts UCS2 encoded data back to UTF8, settings the UTF8 flag when needed. Most of the new test just tests that all conversions work, even inside the database. The length check is very primitive, but a good indicator to test if the database saw bytes (length too large) or characters (length just right). The original patch has some minor problems in the test code with ancient perls that were fixed during DBD::ODBC development, see DBD::ODBC Changes.

Now you "just" need to find someone who is willing and has the time to patch DBD::Sybase ... ;-) The "new" tests should work nearly unmodified with a properly patched DBD::Sybase.

Alexander

Reply via email to