Re: Unicode and Sybase univarchar

Alexander Foken Thu, 03 Jun 2010 13:23:53 -0700

Sorry, forgot to CC to the list ...

-------- Original Message --------
Subject:        Re: Unicode and Sybase univarchar
Date:   Thu, 03 Jun 2010 22:20:37 +0200
From:   Alexander Foken <alexan...@foken.de>
To:     Dave Rolsky <auta...@urth.org>
References:     <alpine.deb.0.9999.1006031018000.9...@urth.org>




On 03.06.2010 17:22, Dave Rolsky wrote:

What's really bizarre is that when I select the value back I getsomething like "0065006d00200064006100730068003a002000e200800094".


Yes, that's a literal string containing a series of 2-digit hex numbers!

I can translate this back to Perl unicode with this madness:

    my $chars = do {
        use bytes;

        join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /(....)/g;
    };

    my $unicode = decode( 'utf8', $chars );

Really strange way to avoid pack()/unpack(). At least, you can get ridof the evil string-eval, simply use hex($_). The combination of "usebytes", chr() with an argument larger than 0x00FF, and decode() alsolooks very strange. $fromdb contains only hex digits, so there should beno need to "use bytes". Unless you force byte mode, chr() should alreadyreturn perfect Unicode, flagged as such. So ...


my $unicode=join('',map { chr hex $_ } $fromdb=~/([0-9a-fA-F]{4})/g);

... should just work. It doesn't quite, because the hex string is notjust a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte streamwritten with a 16 Bit Hex Format for each byte. Each and every16-Bit-Word has its most significant byte set to 0.


If it was a dump of a 16 Bit Unicode encoding, it should read
"0065006d00200064006100730068003a00202014"
and not
"0065006d00200064006100730068003a002000e200800094"

Your call to decode() compensates that, probably because you encodedonce too much before writing the data into the database.

Appart from that, have a look at the tests in DBD::Oracle, there are afew tests for Unicode round trips in 40UnicodeRoundTrip.t and41Unicode.t, try to run them on DBD::Sybase.

Also consider using DBD::ODBC, a Unicode capable ODBC manager (theone(s) on Windows is/are fine) and an Unicode capable ODBC driver forSybase. It may cost you a few CPU cycles for the extra layers, butDBD::ODBC supports Unicode quite well (on non-Windows, you need toexplicitly enable Unicode).


Alexander

--

Alexander Foken
mailto:alexan...@foken.de  http://www.foken.de/alexander/

Re: Unicode and Sybase univarchar

Reply via email to