On Wed, Sep 10, 2003 at 11:42:23AM +0100, Steve Hay wrote: > Bart Lateur wrote: > > >On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote: > > > >>But the question was: How can I arrange for such conversions to be > >>performed automatically by DBI whenever it receives or returns data? > > > >Well, there are two options... either does the dtabase somewhere stores > >the flag indicating that some string is in UTF8, or you have to add that > >information yourself. For the latter, I don't know if it'll actually > >work, but it seems like an appropriate way to do it: add a "BOM" marker > >at the start of the string. > > I don't think the MySQL 3.x stores any flag to indicate that a string is > UTF8, and even if it did I'm not aware of anything in DBI or DBD-mysql > that would make use of it, e.g. to decode data flagged in such a way > into Perl's internal format. > > Adding a BOM myself to the string seems to have problems of its own (see > http://www.unicode.org/unicode/faq/utf_bom.html#27), and again I'm not > aware of DBI / DBD-mysql having anything in them that would make use of > such a BOM. Please correct me if I'm wrong - that could be just the > sort of thing that I'm looking for here.
Basically it should be the job of the drivers to set the uft8 flag on data being retrieved if it is utf8. I believe that the new mysql v4.1 protocol does provide information about the characterset of each colum. DBD::mysql can use that. For people stuck with older versions of mysql, a driver private option could be used to indicate that all char fields are utf8, or have some way of indicating that per-column, such as $sth->bind_col(1, undef, { mysql_charset => 'utf8' }); Tim.