On Wed, May 13, 2015 at 02:29:03PM -0400, Uri Guttman wrote:

> i have a db which doesn't like unicode (and to get it to accept it may 
> require a complete debug of the DBD stack!). we get a rare error of a 
> unicode char in a string. what is the easiest way to just delete that 
> char and note that it was deleted? it is in a perl scalar and not marked 
> as unicode or anything. it is supposed to be simple text but sometimes 
> we get a DB blowup of a unicode char found.

If the database is MySQL then there is a known and serious bug in
DBD::mysql that apparently can't be fixed because it will break backward
compatibility. In my DBIx::Class-based app I have a work-around, which
you can maybe adapt. The bug manifests itself where even if you've
configured everything to be UTF-8 then data that consists solely of
ASCII plus characters in Latin-1 get the non-ASCII characters mis-encoded:

  kitten: ASCII-only, so correctly encoded
  kijtten: ASCII plus non-Latin-1, so correctly encoded
  kijttén: ij isn't Latin-1, so é is correctly encoded
  kittén: no non-Latin-1 characters, so é incorrectly encoded

(in case they don't render properly for you, the funny foreign
characters there are the ij-ligature and e-acute).

There's another MySQL UTF-8 bug which prevents it from correctly storing
any four-byte UTF-8 character even if it's encoded properly when you
pass it to the database. See https://goo.gl/WGPn1a for the fix.

Anyway, here's my DBIx::Class work-around. DBIx::Class doesn't expose a
suitable hook, so I've monkey-patched DBIx::Class::Storage::DBI:

# work around for bug in DBD::mysql that let us write ij-ligature
# and snowman to the db, but not i-acute
BEGIN {
    my $old_ex = \&DBIx::Class::Storage::DBI::_dbh_execute;
    my $new_ex = sub {
        # we need to mangle $bind:
        # my ($self, $dbh, $sql, $bind, $bind_attrs) = @_;
        #   
        # It looks like ...
        # [ 
        #     [ ... ],
        #     ...
        #     [
        #         {
        #             dbic_colname => 'message'
        #         },
        #         'some data',
        #     ],
        #     [
        #         {
        #             dbic_colname => 'method'
        #         },
        #         undef,
        #     ],
        #     ...
        # ] 
        # and we need to edit it in place
        foreach (@{$_[3]}) {
            if(exists($_->[1]) && defined($_->[1])) {
                utf8::upgrade($_->[1])
            }   
        }   
        return $old_ex->(@_);
    };  

    {   
        no strict qw/ refs /;
        no warnings 'redefine';
        *DBIx::Class::Storage::DBI::_dbh_execute = $new_ex;
    }   
}

-- 
David Cantrell | http://www.cantrell.org.uk/david

"The whole aim of practical politics is to keep the populace alarmed
 (and hence clamorous to be led to safety) by menacing it with an
 endless series of hobgoblins, all of them imaginary"  -- H. L. Mencken

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to