On 31-Aug-2006 Daniel Zeman wrote:
> Hi,
> I wonder if anyone can help me.
> 
> I am using
> - Debian Linux
> - Perl 5.8.8
> - DBI (I do not know how to figure out its version)

perl -MDBI -e 'print "$DBI::VERSION\n";'

> - MySQL 5.0.22-Debian_3-log

What DBD::mysql version are you using?

perl -M'DBD::mysql' -e 'print "$DBD::mysql::VERSION\n";'

> I want to store and handle data in UTF-8 but so far I have not been able 
> to force Perl/DBI to do so.
> 
> I have created a table using
> 
>      my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER 
> SET utf8 COLLATE utf8_czech_ci;";
>      $dbh->do($sql);
> 
> I feeded the table with data using
> 
>      my $list_of_columns = join(", ", @names);
>      my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names));
>      my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES 
> ($list_of_values);";
>      $dbh->do($sql);

You would be better using bound parameters as then only the data can be utf-8.

> I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it 
> really looked like the data were stored in correct UTF-8.
> 
> However, when I retrieve the data from Perl/DBI, something in the chain 
> (MySQL? the driver? DBI?) decides that another encoding (probably, 
> Latin1) would be better for me. It "converts" the strings from UTF-8 to 
> that encoding, which means, at the time the data arrives in my Perl 
> code, all the non-Latin1 characters have already been irrecoverably 
> converted to question marks. I would be happy to decode the data myself 
> but there is nothing I can do with the question marks.
> 
> I am using the following code to retrieve the data:
> 
>      my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON 
> hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
>      my $sqlobj = $dbh->prepare($sql);
>      $sqlobj->execute();
>      while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
>      {
>          ...
>      }
> 
> So far, the only workaround I have, is not to tell the DBI the data is 
> UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the 
> single quote), and use Encode; decode("utf8", ...) on anything I fetch 
> from the database. This way, the database never knows the data was a 
> UTF-8 text, treats the bytes as Latin1 characters and returns them 
> undisturbed. However, I cannot access the data using phpMyAdmin (unless 
> I en/decode UTF in my brain), the string lengths do not reflect the 
> reality etc.
> 
> Is there a better way to do it? I think there must be some small stupid 
> locale-like setting telling the machine that I am a UTF guy. But the 
> settings I was able to come up with did not help and I actually have no 
> idea which part of the MySQL-driver-DBI-Perl chain is responsible.
> 
> Any hints are welcome.
> Thanks
> 
> Dan

I posted a small patch (in dbi-dev a couple of months ago) to a recent(ish)
DBD::mysql which allows you to retrieve utf-8 data marked as utf-8 in perl. If
is not a complete solution but works for most things.

See
http://www.nntp.perl.org/group/perl.dbi.dev/4548

It may have been included in the 3.0006_1 development release of DBD::mysql.

You also need to look up "set character set utf8" and "set names utf8" in mysql
documentation.

Martin
--
Martin J. Evans
Easysoft Ltd, UK
http://www.easysoft.com

Reply via email to