Sorry, forgot to CC to the list ...

-------- Original Message --------
Subject:        Re: Unicode and Sybase univarchar
Date:   Thu, 03 Jun 2010 22:20:37 +0200
From:   Alexander Foken <alexan...@foken.de>
To:     Dave Rolsky <auta...@urth.org>
References:     <alpine.deb.0.9999.1006031018000.9...@urth.org>



On 03.06.2010 17:22, Dave Rolsky wrote:

What's really bizarre is that when I select the value back I get something like "0065006d00200064006100730068003a002000e200800094".

Yes, that's a literal string containing a series of 2-digit hex numbers!

I can translate this back to Perl unicode with this madness:

    my $chars = do {
        use bytes;

        join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /(....)/g;
    };

    my $unicode = decode( 'utf8', $chars );

Really strange way to avoid pack()/unpack(). At least, you can get rid of the evil string-eval, simply use hex($_). The combination of "use bytes", chr() with an argument larger than 0x00FF, and decode() also looks very strange. $fromdb contains only hex digits, so there should be no need to "use bytes". Unless you force byte mode, chr() should already return perfect Unicode, flagged as such. So ...

my $unicode=join('',map { chr hex $_ } $fromdb=~/([0-9a-fA-F]{4})/g);

... should just work. It doesn't quite, because the hex string is not just a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte stream written with a 16 Bit Hex Format for each byte. Each and every 16-Bit-Word has its most significant byte set to 0.

If it was a dump of a 16 Bit Unicode encoding, it should read
"0065006d00200064006100730068003a00202014"
and not
"0065006d00200064006100730068003a002000e200800094"

Your call to decode() compensates that, probably because you encoded once too much before writing the data into the database.

Appart from that, have a look at the tests in DBD::Oracle, there are a few tests for Unicode round trips in 40UnicodeRoundTrip.t and 41Unicode.t, try to run them on DBD::Sybase.

Also consider using DBD::ODBC, a Unicode capable ODBC manager (the one(s) on Windows is/are fine) and an Unicode capable ODBC driver for Sybase. It may cost you a few CPU cycles for the extra layers, but DBD::ODBC supports Unicode quite well (on non-Windows, you need to explicitly enable Unicode).

Alexander

--

Alexander Foken
mailto:alexan...@foken.de  http://www.foken.de/alexander/

Reply via email to