Tim Bunce wrote:
On Wed, Sep 10, 2003 at 11:42:23AM +0100, Steve Hay wrote:
Bart Lateur wrote:
On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:
But the question was: How can I arrange for such conversions to be
performed automatically by DBI whenever it receives or returns data?
Well, there are two options... either does the dtabase somewhere stores
the flag indicating that some string is in UTF8, or you have to add that
information yourself. For the latter, I don't know if it'll actually
work, but it seems like an appropriate way to do it: add a "BOM" marker
at the start of the string.
I don't think the MySQL 3.x stores any flag to indicate that a string is
UTF8, and even if it did I'm not aware of anything in DBI or DBD-mysql
that would make use of it, e.g. to decode data flagged in such a way
into Perl's internal format.
Adding a BOM myself to the string seems to have problems of its own (see
http://www.unicode.org/unicode/faq/utf_bom.html#27), and again I'm not
aware of DBI / DBD-mysql having anything in them that would make use of
such a BOM. Please correct me if I'm wrong - that could be just the
sort of thing that I'm looking for here.
Basically it should be the job of the drivers to set the uft8 flag on
data being retrieved if it is utf8. I believe that the new mysql v4.1
protocol does provide information about the characterset of each colum.
DBD::mysql can use that.
Ah. In that case, I should get onto the DBD-mysql people to look for
assistance. I was thinking that DBI itself would be adding some kind of
UTF-8 support.
For people stuck with older versions of mysql, a driver private
option could be used to indicate that all char fields are utf8,
or have some way of indicating that per-column, such as
$sth->bind_col(1, undef, { mysql_charset => 'utf8' });
OK, I'll pass this suggestion on to the DBD-mysql maintainer(s).
Thanks,
- Steve