Hi,
I wonder if anyone can help me.
I am using
- Debian Linux
- Perl 5.8.8
- DBI (I do not know how to figure out its version)
- MySQL 5.0.22-Debian_3-log
I want to store and handle data in UTF-8 but so far I have not been able
to force Perl/DBI to do so.
I have created a table using
my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER
SET utf8 COLLATE utf8_czech_ci;";
$dbh->do($sql);
I feeded the table with data using
my $list_of_columns = join(", ", @names);
my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names));
my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES
($list_of_values);";
$dbh->do($sql);
I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it
really looked like the data were stored in correct UTF-8.
However, when I retrieve the data from Perl/DBI, something in the chain
(MySQL? the driver? DBI?) decides that another encoding (probably,
Latin1) would be better for me. It "converts" the strings from UTF-8 to
that encoding, which means, at the time the data arrives in my Perl
code, all the non-Latin1 characters have already been irrecoverably
converted to question marks. I would be happy to decode the data myself
but there is nothing I can do with the question marks.
I am using the following code to retrieve the data:
my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON
hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
my $sqlobj = $dbh->prepare($sql);
$sqlobj->execute();
while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
{
...
}
So far, the only workaround I have, is not to tell the DBI the data is
UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the
single quote), and use Encode; decode("utf8", ...) on anything I fetch
from the database. This way, the database never knows the data was a
UTF-8 text, treats the bytes as Latin1 characters and returns them
undisturbed. However, I cannot access the data using phpMyAdmin (unless
I en/decode UTF in my brain), the string lengths do not reflect the
reality etc.
Is there a better way to do it? I think there must be some small stupid
locale-like setting telling the machine that I am a UTF guy. But the
settings I was able to come up with did not help and I actually have no
idea which part of the MySQL-driver-DBI-Perl chain is responsible.
Any hints are welcome.
Thanks
Dan