Hi,
I wonder if anyone can help me.

I am using
- Debian Linux
- Perl 5.8.8
- DBI (I do not know how to figure out its version)
- MySQL 5.0.22-Debian_3-log

I want to store and handle data in UTF-8 but so far I have not been able to force Perl/DBI to do so.

I have created a table using

my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER SET utf8 COLLATE utf8_czech_ci;";
    $dbh->do($sql);

I feeded the table with data using

    my $list_of_columns = join(", ", @names);
    my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names));
my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES ($list_of_values);";
    $dbh->do($sql);

I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it really looked like the data were stored in correct UTF-8.

However, when I retrieve the data from Perl/DBI, something in the chain (MySQL? the driver? DBI?) decides that another encoding (probably, Latin1) would be better for me. It "converts" the strings from UTF-8 to that encoding, which means, at the time the data arrives in my Perl code, all the non-Latin1 characters have already been irrecoverably converted to question marks. I would be happy to decode the data myself but there is nothing I can do with the question marks.

I am using the following code to retrieve the data:

my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
    my $sqlobj = $dbh->prepare($sql);
    $sqlobj->execute();
    while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
    {
        ...
    }

So far, the only workaround I have, is not to tell the DBI the data is UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the single quote), and use Encode; decode("utf8", ...) on anything I fetch from the database. This way, the database never knows the data was a UTF-8 text, treats the bytes as Latin1 characters and returns them undisturbed. However, I cannot access the data using phpMyAdmin (unless I en/decode UTF in my brain), the string lengths do not reflect the reality etc.

Is there a better way to do it? I think there must be some small stupid locale-like setting telling the machine that I am a UTF guy. But the settings I was able to come up with did not help and I actually have no idea which part of the MySQL-driver-DBI-Perl chain is responsible.

Any hints are welcome.
Thanks

Dan

Reply via email to