... should just work. It doesn't quite, because the hex string is not
just a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte
stream written with a 16 Bit Hex Format for each byte. Each and every
16-Bit-Word has its most significant byte set to 0.
If it was a dump of a 16 Bit Unicode encoding, it should read
"0065006d00200064006100730068003a00202014"
and not
"0065006d00200064006100730068003a002000e200800094"
Your call to decode() compensates that, probably because you encoded
once too much before writing the data into the database.
I did not encode at all. I simply created a utf8 string in Perl land
and inserted it into Sybase.
And Sybase / DBD::Sybase happily ignored Perl's UTF8 flag and stored the
byte stream as if it were characters.
Really, DBD::Sybase needs to handle any character set translation, not
the end user.
Right. (But remember that DBI was there before Unicode support was added
to Perl, and also most DBDs are older that the Unicode support. Before
Unicode was there, you just passed bytes around and everything just worked.)
For the same reason, but with a different DBD, I hacked the first
Unicode patch for DBD::ODBC, just enough code to have proper Unicode
support in bind parameters from Perl to the database, and in returned
fetch values. DBD::ODBC has been improved since the patch was merged by
Martin J. Evans into v1.14, and it now supports Unicode in many other
places, too. The DBI API was also improved during that process, allowing
to have Unicode SQL query strings.
That's why I proposed to switch to DBD::ODBC: It is well tested and
supports Unicode as good as the ODBC driver does.
The raw patch is still available at
<http://www.alexander-foken.de/unicode-patch.txt.gz>. DBD::Sybase
obviously lacks such a patch. Sybase may have a Unicode API, but no part
of DBD::Sybase uses it (properly). In DBD::ODBC, the patch "just" tests
the UTF8 flag for all relevant data coming from Perl and converts UTF8
to the UCS2 encoding required by the ODBC API, and converts UCS2 encoded
data back to UTF8, settings the UTF8 flag when needed. Most of the new
test just tests that all conversions work, even inside the database. The
length check is very primitive, but a good indicator to test if the
database saw bytes (length too large) or characters (length just right).
The original patch has some minor problems in the test code with ancient
perls that were fixed during DBD::ODBC development, see DBD::ODBC Changes.
Now you "just" need to find someone who is willing and has the time to
patch DBD::Sybase ... ;-)
The "new" tests should work nearly unmodified with a properly patched
DBD::Sybase.
Alexander