On Wed, May 13, 2015 at 02:29:03PM -0400, Uri Guttman wrote: > i have a db which doesn't like unicode (and to get it to accept it may > require a complete debug of the DBD stack!). we get a rare error of a > unicode char in a string. what is the easiest way to just delete that > char and note that it was deleted? it is in a perl scalar and not marked > as unicode or anything. it is supposed to be simple text but sometimes > we get a DB blowup of a unicode char found.
If the database is MySQL then there is a known and serious bug in DBD::mysql that apparently can't be fixed because it will break backward compatibility. In my DBIx::Class-based app I have a work-around, which you can maybe adapt. The bug manifests itself where even if you've configured everything to be UTF-8 then data that consists solely of ASCII plus characters in Latin-1 get the non-ASCII characters mis-encoded: kitten: ASCII-only, so correctly encoded kijtten: ASCII plus non-Latin-1, so correctly encoded kijttén: ij isn't Latin-1, so é is correctly encoded kittén: no non-Latin-1 characters, so é incorrectly encoded (in case they don't render properly for you, the funny foreign characters there are the ij-ligature and e-acute). There's another MySQL UTF-8 bug which prevents it from correctly storing any four-byte UTF-8 character even if it's encoded properly when you pass it to the database. See https://goo.gl/WGPn1a for the fix. Anyway, here's my DBIx::Class work-around. DBIx::Class doesn't expose a suitable hook, so I've monkey-patched DBIx::Class::Storage::DBI: # work around for bug in DBD::mysql that let us write ij-ligature # and snowman to the db, but not i-acute BEGIN { my $old_ex = \&DBIx::Class::Storage::DBI::_dbh_execute; my $new_ex = sub { # we need to mangle $bind: # my ($self, $dbh, $sql, $bind, $bind_attrs) = @_; # # It looks like ... # [ # [ ... ], # ... # [ # { # dbic_colname => 'message' # }, # 'some data', # ], # [ # { # dbic_colname => 'method' # }, # undef, # ], # ... # ] # and we need to edit it in place foreach (@{$_[3]}) { if(exists($_->[1]) && defined($_->[1])) { utf8::upgrade($_->[1]) } } return $old_ex->(@_); }; { no strict qw/ refs /; no warnings 'redefine'; *DBIx::Class::Storage::DBI::_dbh_execute = $new_ex; } } -- David Cantrell | http://www.cantrell.org.uk/david "The whole aim of practical politics is to keep the populace alarmed (and hence clamorous to be led to safety) by menacing it with an endless series of hobgoblins, all of them imaginary" -- H. L. Mencken _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

