Re: Add Unicode Support to the DBI

Tim Bunce Tue, 08 Nov 2011 08:15:35 -0800

On Tue, Nov 08, 2011 at 02:45:39PM +0000, Martin J. Evans wrote:
> On 08/11/11 13:16, Tim Bunce wrote:
> >On Mon, Nov 07, 2011 at 01:37:38PM +0000, Martin J. Evans wrote:
> 
> >2. Try to make a data-driven common test script.
> 
> There is already one attached to the bottom of the post and referred to in 
> the post - probably was not very clear.


Ah, it's in the middle, in the "DBDs" section.
Could you put it in github or the DBI repo?
I may find some time to hack on it.

> >     It should fetch the length of the stored value, something like:
> >         CREATE TABLE t (c VARCHAR(10));
> >         INSERT INTO t VALUES (?)<=  $sth->execute("\x{263A}") # simley
> >         SELECT LENGTH(c), c FROM t
> 
> It does that with a euro \x{20ac}
> 
> >     Fetching the LENGTH is important because it tells us if the DB is
> >     treating the value as Unicode.
> 
> It doesn't do that but it checks what went in \x{20ac} is what comes
> out which is the same as checking the length since what goes in is a
> euro character. For a euro to come back out at the minimum the DBD
> would have to decode the data from the database.

It's not only important that what goes in comes back out again, but also
that what goes in is interpreted by the database in the right way.
Otherwise there are lots of more subtle issues like sorting bugs caused
by the database 'seeing' encoded bytes instead of unicode characters.
(Hence my point about the description for DBD::Unify being incomplete.)

It's _possible_ for a db to store a euro as a single byte. So it's
possible for the test to yield a false positive.  I prefer using a
simley because it's not possible to store it in a single byte.
So if LENGTH(c) returns 1 we can be very confident that the db is
interpreting the data correctly.

> >Thanks again. I've only given it a quick skim. I'll read it again before LPW.
> 
> I will try and update it before then but a) it is my birthday tomorrow and b) 
> I'm a bit under the weather at the moment.

HAPPY BIRTHDAY!
Get well soon!

> >Meanwhile, it would be great if people could contribute the info for #1.
> 
> I am happy to collect any such info and write it up so please at least cc me.

> >p.s. Using data_diff() http://search.cpan.org/~timb/DBI/DBI.pm#data_diff
> >would make the tests shorter.
> >     my $sample_string = "\x{263A}";
> >     ...
> >     print data_diff($sample_string, $returned_string);
> 
> Yes, the output is a bit screwed because I was running the test code
> on Windows in a dos terminal and I've never got that working properly.

Ouch.

> Data::Dumper produces better output but I forgot data_diff - I will
> change where relevant.
> 
> BTW, the example code in each DBD was not really the test, it was just
> an example. My aim was to produce one script which ran to any DBD and
> that is attached to the end of the blog post. It's just it was too
> long to incorporate into a blog posting whereas the short simple
> examples were better.

Yeap. Good idea. I'd simply missed the link when I skimmed it.

Thanks again for driving this forward Martin.

Tim.

Re: Add Unicode Support to the DBI

Reply via email to