Patrick Galbraith <[EMAIL PROTECTED]> writes: > email too soon ;) I was actually going to answer. Yes, we want to > support UTF-8 - that's one of the feature major bullet-points for > 3.0. I'm not sure what all is involved, and have to deal with these > issues in my work on the federated storage engine, but the answer is a > definite "yes" ;)
Having spent much effort understanding all the issues involved in Perl utf8, I want to give a few hints on this since it is an issue that is very often misunderstood. The issue is more complex than just "setting the utf8 flag", in fact when people say that it usually indicates that they are doing the wrong thing. Perl programmers should _never_ have to see or even know about the utf8 flag, it should only ever be visible to XS code. All Perl strings are sequences of Unicode characters. The real issue is that Perl utf8 introduces two different _internal_ representations of strings (latin1 and utf8). This means that _all_ XS code that accesses string data must first check which internal representation is used for the string, and convert the string data if necessary. It is _always_ wrong to directly use Perl string data from XS code without appropriate conversion. The utf8 flag is the bit that defines which internal format is used, and that is all it should be used for. So for DBD::mysql, whenever a string is passed to the MySQL API, it must be converted from the Perl internal representation to the MySQL client character set. Even if the client character set is latin1 it is necessary to convert, since it is perfectly possible and normal for latin1 characters to be stored in utf8 internal format in Perl strings! When data is pulled from the MySQL API, if the data contains characters with Unicode value > 255 it must be stored in utf8 internal format. If all characters are <= 255 it can be stored in either utf8 or latin1 internal format in principle, though in practise most modules that deal with non-latin1 characters tend to use the utf8 internal format exclusively. Anyway, just my opinions on this issue which reflect the fact that we have experienced much pain over XS code that didn't follow the above guidelines! - Kristian. -- Kristian Nielsen [EMAIL PROTECTED] Development Manager, Sifira A/S