Hello Jesse

I'm pretty sure your data has been UTF-8 encoded twice. Consider this example:

use strict;
use warnings;

use Encode;

# $string is UTF-8, but Perl doesn't know
my $string = 'Pérez-Reverte, Arturo Кири́ллица ქართული  汉字 / 漢';
# $double_utf8 contains the double UTF-8 encoded string
# note that this is an implicit ISO-8859-1 to UTF-8 conversion
my $double_utf8 = Encode::encode('UTF-8', $string);

print "double encoded UTF-8:\n", "$double_utf8\n\n";

# let Perl believe that $double_utf8 is UTF-8
Encode::_utf8_on($double_utf8);
# run $double_utf8 through a UTF-8 to ISO-8859-1 conversion
my $double_utf8_to_latin1 = Encode::decode('ISO-8859-1', $double_utf8);

print "double UTF-8 to ISO-8859-1:\n", "$double_utf8_to_latin1\n\n";

So why is your data in the database double encoded UTF-8? The problem is that 
you're not using the mysql_enable_utf8 option (see the DBD::mysql 
documentation). If you don't use that option as a part to the call to 
'connect()', DBD::mysql will the configure the connection in a way that MySQL 
believes it's being sent ISO-8859-1. Because you're table is configured to 
store character data as UTF-8, MySQL converts the received data from ISO-8859-1 
to UTF-8. There you have double encoded UTF-8! 

The solution is simply to use mysql_enable_utf8 as part of the call to 
'connect()'. If you're using DBIx::Class I recommend also disabling the 
mysql_auto_reconnect option, this will save you a lot of headache.

Regards
Matias E. Fernandez



_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk

Reply via email to