Re: Unicode and Sybase univarchar

Alexander Foken Fri, 04 Jun 2010 05:45:59 -0700

... should just work. It doesn't quite, because the hex string is notjust a dump of a 16 Bit Unicode encoding, but it is a UTF-8 bytestream written with a 16 Bit Hex Format for each byte. Each and every16-Bit-Word has its most significant byte set to 0.
If it was a dump of a 16 Bit Unicode encoding, it should read
"0065006d00200064006100730068003a00202014"
and not
"0065006d00200064006100730068003a002000e200800094"
Your call to decode() compensates that, probably because you encodedonce too much before writing the data into the database.
I did not encode at all. I simply created a utf8 string in Perl landand inserted it into Sybase.

And Sybase / DBD::Sybase happily ignored Perl's UTF8 flag and stored thebyte stream as if it were characters.

Really, DBD::Sybase needs to handle any character set translation, notthe end user.

Right. (But remember that DBI was there before Unicode support was addedto Perl, and also most DBDs are older that the Unicode support. BeforeUnicode was there, you just passed bytes around and everything just worked.)

For the same reason, but with a different DBD, I hacked the firstUnicode patch for DBD::ODBC, just enough code to have proper Unicodesupport in bind parameters from Perl to the database, and in returnedfetch values. DBD::ODBC has been improved since the patch was merged byMartin J. Evans into v1.14, and it now supports Unicode in many otherplaces, too. The DBI API was also improved during that process, allowingto have Unicode SQL query strings.

That's why I proposed to switch to DBD::ODBC: It is well tested andsupports Unicode as good as the ODBC driver does.

The raw patch is still available at<http://www.alexander-foken.de/unicode-patch.txt.gz>. DBD::Sybaseobviously lacks such a patch. Sybase may have a Unicode API, but no partof DBD::Sybase uses it (properly). In DBD::ODBC, the patch "just" teststhe UTF8 flag for all relevant data coming from Perl and converts UTF8to the UCS2 encoding required by the ODBC API, and converts UCS2 encodeddata back to UTF8, settings the UTF8 flag when needed. Most of the newtest just tests that all conversions work, even inside the database. Thelength check is very primitive, but a good indicator to test if thedatabase saw bytes (length too large) or characters (length just right).The original patch has some minor problems in the test code with ancientperls that were fixed during DBD::ODBC development, see DBD::ODBC Changes.

Now you "just" need to find someone who is willing and has the time topatch DBD::Sybase ... ;-)The "new" tests should work nearly unmodified with a properly patchedDBD::Sybase.


Alexander

Re: Unicode and Sybase univarchar

Reply via email to