Martin J. Evans napsal(a):
> On 31-Aug-2006 Daniel Zeman wrote:
>> Hi,
>> I wonder if anyone can help me.
>>
>> I am using
>> - Debian Linux
>> - Perl 5.8.8
>> - DBI (I do not know how to figure out its version)
>
> perl -MDBI -e 'print "$DBI::VERSION\n";'
1.51
>
>> - MySQL 5.0.22-Debian_3-log
>
> What DBD::mysql version are you using?
>
> perl -M'DBD::mysql' -e 'print "$DBD::mysql::VERSION\n";'
3.0004
>
>> I want to store and handle data in UTF-8 but so far I have not been
able
>> to force Perl/DBI to do so.
>>
>> I have created a table using
>>
>> my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER
>> SET utf8 COLLATE utf8_czech_ci;";
>> $dbh->do($sql);
>>
>> I feeded the table with data using
>>
>> my $list_of_columns = join(", ", @names);
>> my $list_of_values = join(", ",
map{"_utf8'$record->{$_}'"}(@names));
>> my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES
>> ($list_of_values);";
>> $dbh->do($sql);
>
> You would be better using bound parameters as then only the data can
be utf-8.
>
>> I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1
and it
>> really looked like the data were stored in correct UTF-8.
>>
>> However, when I retrieve the data from Perl/DBI, something in the chain
>> (MySQL? the driver? DBI?) decides that another encoding (probably,
>> Latin1) would be better for me. It "converts" the strings from UTF-8 to
>> that encoding, which means, at the time the data arrives in my Perl
>> code, all the non-Latin1 characters have already been irrecoverably
>> converted to question marks. I would be happy to decode the data myself
>> but there is nothing I can do with the question marks.
>>
>> I am using the following code to retrieve the data:
>>
>> my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON
>> hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
>> my $sqlobj = $dbh->prepare($sql);
>> $sqlobj->execute();
>> while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
>> {
>> ...
>> }
>>
>> So far, the only workaround I have, is not to tell the DBI the data is
>> UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the
>> single quote), and use Encode; decode("utf8", ...) on anything I fetch
>> from the database. This way, the database never knows the data was a
>> UTF-8 text, treats the bytes as Latin1 characters and returns them
>> undisturbed. However, I cannot access the data using phpMyAdmin (unless
>> I en/decode UTF in my brain), the string lengths do not reflect the
>> reality etc.
>>
>> Is there a better way to do it? I think there must be some small stupid
>> locale-like setting telling the machine that I am a UTF guy. But the
>> settings I was able to come up with did not help and I actually have no
>> idea which part of the MySQL-driver-DBI-Perl chain is responsible.
>>
>> Any hints are welcome.
>> Thanks
>>
>> Dan
>
> I posted a small patch (in dbi-dev a couple of months ago) to a
recent(ish)
> DBD::mysql which allows you to retrieve utf-8 data marked as utf-8 in
perl. If
> is not a complete solution but works for most things.
>
> See
> http://www.nntp.perl.org/group/perl.dbi.dev/4548
>
> It may have been included in the 3.0006_1 development release of
DBD::mysql.
>
> You also need to look up "set character set utf8" and "set names
utf8" in mysql
> documentation.
This is it! Thanks a lot! Just doing
$dbh->prepare("SET NAMES 'utf8'")->execute();
after connecting solved my problem. It seems that I am getting the
strings with the utf8 flag on so hopefully I can live without your patch
(or someone has already patched the system I am on) - but I am going
to store the link for future reference. Thanks!
Dan