Hello
My problem is accurately described in this ticket [1]:
> My issue is that DBD::mysql passes all data as-is to the database even when
> the connection is in utf8 mode. This way all non ASCII characters of
> non-utf8-tagged strings gets lost in the database. But passing
> non-utf8-tagged strings to DBD::mysql should be absolutely valid, since
> they're valid for Perl they should be valid for DBD::mysql as well.
This is about "The Unicode Bug" [2] and will cause the following test to fail:
my $title = "\x{e4}\x{f6}\x{fc}"; # "äöü"
$album->title($title);
$album->update();
$album->discard_changes();
ok($album->title(), $title, "UTF-8 column survives read/write cycle and
preserves character semantics");
Relying on the format with which Perl internally holds strings, is a bad idea.
Specially since [3]:
> by default, the internal format is either ISO-8859-1 (latin-1), or utf8,
> depending on the history of the string.
The rules of thumb for handling data in a programm is outlined in "I/O flow
(the actual 5 minute tutorial)" [4]
> 1. Receive and decode
> 2. Process
> 3. Encode and output
The 1st step is handled by DBD::mysql, but not 3rd! Thus, if I want to
communicate with a database in UTF-8, I need to encode my data from the format
Perl currently holds it in, to UTF-8.
I checked the code base of DBD::mysql [5] and found nowhere where data would be
encoded to UTF-8, but I found a test file [6] where a string like this
my $blob = "\x{c4}\x{80}dam"; # same as utf8_str but not utf8 encoded
is being tested for the UTF8 flag after a read/write cycle to the database. I'm
not sure wether this is a correct test case, because $blob is not really a
blob, but a string that suffers "The Unicode Bug" [2]. But I understand the
problem that blobs should not get de/encoded. However, I think that the correct
approach according to [4] would be to
Encode::encode('UTF-8', $data);
before sending it to the database, if mysql_enable_utf8 is being used.
However, trying to avoid these issues, one approach is to use
DBIx::Class::UTF8Columns, but this seems to be deprecated because is suffers of
a bug [7]
> deep in the core of DBIx::Class which affects any component attempting to
> perform encoding/decoding by overloading store_column and get_columns. As a
> result of this problem create sends the original column values to the
> database, while update sends the encoded values. DBIx::Class::UTF8Columns and
> DBIx::Class::ForceUTF8 are both affected by this bug.
We have come up with a solution that makes use of DBIx::Class::InflateColumn,
which is described as follows [8]:
> This component translates column data into references, i.e. "inflating" the
> column data. It also "deflates" references into an appropriate format for the
> database.
This seems like the right tool to de/encode data before being sent to the
database:
__PACKAGE__->inflate_column('title' => {
inflate => sub {
my ($value, $row_obj) = @_;
# DBD should have already done decoding
return $value;
},
deflate => sub {
my ($value, $row_obj) = @_;
# Always Encode, as DBD won't do it
return Encode::encode('UTF-8', $value);
},
});
Note that in the above example I assume that mysql_enable_utf8 is being used!
As I have not found the bug description mentioned in [7], I would like to ask
whether this solution suffers the same issues as DBIx::Class::UTF8Columns, does.
Regards
Matias E. Fernandez
[1] https://rt.cpan.org/Public/Bug/Display.html?id=25590#txn-300430
[2] http://perldoc.perl.org/5.12.0/perlunicode.html#The-%22Unicode-Bug%22
[3]
http://perldoc.perl.org/5.12.0/perlunifaq.html#I-lost-track%3b-what-encoding-is-the-internal-format-really%3f
[4]
http://perldoc.perl.org/5.12.0/perlunitut.html#I%2fO-flow-(the-actual-5-minute-tutorial)
[5] http://search.cpan.org/dist/DBD-mysql/
[6] http://cpansearch.perl.org/src/CAPTTOFU/DBD-mysql-4.014/t/55utf8.t
[7]
http://search.cpan.org/~frew/DBIx-Class-0.08121/lib/DBIx/Class/UTF8Columns.pm#Warning_-_Module_does_not_function_properly_on_create/insert
[8]
http://search.cpan.org/~frew/DBIx-Class-0.08121/lib/DBIx/Class/InflateColumn.pm
_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[email protected]