Patrick Galbraith <[EMAIL PROTECTED]> writes:

> email too soon ;) I was actually going to answer. Yes, we want to
> support UTF-8 - that's one of the feature major bullet-points for
> 3.0. I'm not sure what all is involved, and have to deal with these
> issues in my work on the federated storage engine, but the answer is a
> definite "yes" ;)

Having spent much effort understanding all the issues involved in Perl
utf8, I want to give a few hints on this since it is an issue that is
very often misunderstood.

The issue is more complex than just "setting the utf8 flag", in fact
when people say that it usually indicates that they are doing the wrong
thing. Perl programmers should _never_ have to see or even know about
the utf8 flag, it should only ever be visible to XS code. All Perl
strings are sequences of Unicode characters.

The real issue is that Perl utf8 introduces two different _internal_
representations of strings (latin1 and utf8). This means that _all_ XS
code that accesses string data must first check which internal
representation is used for the string, and convert the string data if
necessary. It is _always_ wrong to directly use Perl string data from XS
code without appropriate conversion. The utf8 flag is the bit that
defines which internal format is used, and that is all it should be used
for.

So for DBD::mysql, whenever a string is passed to the MySQL API, it must
be converted from the Perl internal representation to the MySQL client
character set. Even if the client character set is latin1 it is
necessary to convert, since it is perfectly possible and normal for
latin1 characters to be stored in utf8 internal format in Perl strings!

When data is pulled from the MySQL API, if the data contains characters
with Unicode value > 255 it must be stored in utf8 internal format. If
all characters are <= 255 it can be stored in either utf8 or latin1
internal format in principle, though in practise most modules that deal
with non-latin1 characters tend to use the utf8 internal format
exclusively.

Anyway, just my opinions on this issue which reflect the fact that we
have experienced much pain over XS code that didn't follow the above
guidelines!

 - Kristian.

-- 
Kristian Nielsen   [EMAIL PROTECTED]
Development Manager, Sifira A/S

Reply via email to