Re: Adding utf8 support to DBD::mysql

Martin J. Evans Sun, 30 Apr 2006 14:52:30 -0700

Patrick Galbraith wrote:

Martin J. Evans wrote:
Martin,
Sure, I'll be glad to put in the code. If you can come up with tests,that's one thing we need. Whatever you like, and whatever works for you.I'll always give you due credit.

Slightly confused as I think I provided tests for the changes I'vesubmitted.


Was there some changes I omitted tests for? (other than the utf8 hack
I just posted).

I noticed there were to dos in the the TODO file for more tests -
I can try and knock a few of those off during a free moment.

I also would like to know how to have a test end its while loop if aversion doesn't support the server it's running against - if you knowthat, that would be fabulous. I came up with a way by querying thedatabase, then checking if a string is /^4/ or /^3/ (stored procs), ifit matches that, set $state = 1. I think there may be a better way!(well, that'd be Test::More I think)


I'll look in to it but don't know off the top of my head.
I saw something in the Testing() fn about skipping/ignoring tests.

Before I roll a dev release, I'd like to make sure I have looked at andadded all of your patches that you have, as well as have tests for them.


Great - let me know when it is in subversion and I'll take another look.

Martin

Again, Thank you!

Patrick

Patrick Galbraith wrote:

Martin J. Evans wrote:

Martin,

Thanks much! This is dbdimp.c, right?

Yes

I  will add this tomorrow (not working today),




and I'm stricly not working tomorrow - bank holiday - but it is likely
I'll find time to do something.

and test it out. If everything is a go, then I'll roll up a dev3.0003_1 releaseAlso, I've mentioned giving you write access to SVNfor DBD::mysql - do you want that? Also, what is your cpan username?




Argh, tend to be a submitter/commentor rather than a creator.
I've written DBIx::Log4perl but still have not submitted it
as I have not yet got a cpan username.

If you are happy giving me write access to DBD::mysql I'm happy to
contribute although I think you've got most of my stuff now. If
all it means is getting a cpan username and that is straight forward
I'll do that also.

The utf8 patch is very much a quick hack but it you were to submit
the other patches to dbd::mysql first I'd happily investigate/test
further. The reason I say this is that I'm finding it increasingly
difficult to submit patches on top of patches that are not comitted
yet.

Martin

Kind regards,

Patrick

Shamelessy stolen from Michael Kröll and against a patch on dbd::mysql
on subversion I've sent to Patrick which is not committed yet:

@@ -2915,6 +2914,9 @@
          if (dbis->debug >= 2)

PerlIO_printf(DBILOGFP, "st_fetch string data %s\n",fbh->data);

          sv_setpvn(sv, fbh->data, fbh->length);
+          if (is_high_bit_set(fbh->data)) {
+              SvUTF8_on(sv);
+          }
          break;
@@ -2968,6 +2970,10 @@
          {    --len; }
        }
        sv_setpvn(sv, col, len);
+        if (is_high_bit_set(col)) {
+            SvUTF8_on(sv);
+        }
+       @@ -3881,3 +3901,11 @@
  return

sv_2mortal(my_ulonglong2str(mysql_insert_id(&((imp_dbh_t*)imp_dbh)->mysql)));

}
#endif
+
+int is_high_bit_set(val)
+                char *val;
+{
+    while (*val)
+        if (*val++ & 0x80) return 1;
+    return 0;
+}

would appear to make the previous code (below and without theEncode) I posted

work fine.

I hasten to add I've not tested this much (other than the postedtest code).


What I'm personally interested in is whether retrieving utf data from

DBD::mysql and pushing it through JSON, JSON->objToJson works and Icannot

easily test this until Tuesday.

If Patrick can commit my previous changes I'll test this more.

Martin
--
Martin J. Evans
Easysoft Ltd, UK
http://www.easysoft.com


On 30-Apr-2006 Martin J. Evans wrote:

I hope I'm not muddying the waters but dbd::mysql UTF support forreturned

data (not metadata) seems to be nearly there. I need UTF8 support on
at least inserted and returned results-sets although I'm less bothered

by UTF8 table/column names. It would seem that if you define yourtable as

using UTF8 then insertion is not a problem but retrieval is. The

following code nearly works - it is just the setting of the utf8flag on

the returned data that is wrong:

#!/usr/bin/perl -w
use strict;
use DBI qw(:utils);
use charnames ':full';
use Encode;

print "Is utf8::is_utf8 defined: ", defined &utf8::is_utf8, "\n";
print "Is utf8::valid defined: ", defined &utf8::valid, "\n";

my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; #smiley

print join(" ", unpack("H*", $str)), "\n";
print "length(str) = ", length($str), "\n";
print "bytes::length(str) = ", bytes::length($str), "\n";

print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n";
print "data_string_desc: ", data_string_desc($str),"\n";

open OUT, ">uni.out";
binmode(OUT, ":utf8");
print OUT "$str\n";
# data written to uni.out is UTF8

my $dbh = DBI->connect("dbi:mysql:test", "xxx", "xxx");
# there are posts on dbi-user as to whether both or either of
# the following should be set
$dbh->do("set character set utf8");
$dbh->do("set names utf8");
$dbh->do("drop table if exists utf");
$dbh->do("create table utf (a char(100)) default charset utf8");
my $sth = $dbh->prepare("insert into utf values (?)");
$sth->execute($str);
$sth = $dbh->prepare("select * from utf");
$sth->execute;

my @row = $sth->fetchrow_array;

print "data_string_desc (after fetch): ",data_string_desc($row[0]),"\n";

# the following shows we'e got the right data back
# but perl does not know it is utf8
print join(" ", unpack("H*", $row[0])), "\n";
# turning on utf8 causes the rignt uf8 sequence to be output
# and hence sv_utf8_upgrade(sv) will probably work
Encode::_utf8_on($row[0]);

print "data_string_desc (after fetch): ",data_string_desc($row[0]),"\n";


open OUT, ">utf.out";
binmode (OUT, ":utf8");
print OUT $row[0];
close OUT;
# data written to utf.out is not UTF8 unless is marked utf8

produces:

Is utf8::is_utf8 defined: 1
Is utf8::valid defined: 1
e298ba787878d790d8a7
length(str) = 6
bytes::length(str) = 10
utf8::is_utf8 = 1
data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes

data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters10 bytes

e298ba787878d790d8a7

with utf.out containing:

<C3><A2><C2><98><C2><BA>xxx<C3><97><C2><90><C3><98><C2><A7>

without that Encode::_utf8_on($row[0]);

Michael Kröll (apr-10-2006)
posted a change for DBD::db2 which seemed to sort this out for DB2
so a similar change could be added to mysql.

Hope this helps.

Martin
--
Martin J. Evans
Easysoft Ltd, UK
http://www.easysoft.com


On 24-Apr-2006 Tim Bunce wrote:

[I'm at the mysql conference and Patrick asked me about adding utf8
support to DBD::mysql. I said I'd look at the libmysql docs andgive mythoughts. I'm posting to dbi-dev since it may be of interest toothers
interested in enhancing DBD::mysql and to other driver developers.
These are just random thoughts from a quick look at the docs.]

The keys mysql docs seem to be
http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html

The mysql api and client->server protocol doesn't support passing
characterset info to the server on a per-statement / per-bindvalue basis.(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht
m
l)
So the sane way to send utf8 to the server is by setting the'connectioncharacter set' to utf8 and then only sending utf8 (or its ASCIIsubset)
to the server on that connection.

*** Fetching data:
MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELDstructure.
It's the "character set number for the field".

So set the UTF8 flag based on that value. Something like:
   (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv);
I couldn't see any docs for the values of the charsetnr field.
Also, would be good to enable perl code to access the charsetnrvalues:
   $sth->{mysql_charsetnr}->[$i]

*** Fetching Metadata:

The above is a minimum. It doesn't address metadata like field names
($sth->{NAME}) that might also be in utf8. For that the driverneeds to
know if the 'connection character set' is currently utf8.

(The docs mention mysql->charset but it's not clear if that's part of
the public API.)

However it's detected, the code needs to end up doing:
(...connection charset is utf8...) ? SvUTF8_on(sv) :SvUTF8_off(sv);
on the metadata.


*** SET NAMES '...'
Intercept SET NAMES and call the mysql_set_character_set() APIinstead.Seehttp://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html
*** Detecting Inconsistencies
If the connection character set is _not_ utf8 but the applicationcallsthe driver with data (or SQL statement) that has the UTF8 flagset, then
it could issue a warning. In practice that may be to be too noisy for
people that done their own workarounds for utf8 support. If so then
they could be changes to level 1 trace messages.

If the connection character set _is_ utf8, and the application calls
the driver with data (or SQL statement) that does _not_ have the UTF8
flag set but _does_ have bytes with the high bit set, then the driver
should issue a warning. The checking for high bit set is an extracost
so this should only be enabled if tracing and/or an attribute is set
(perhaps called $dbh->{mysql_charset_checks} = 1)

Tim.

Re: Adding utf8 support to DBD::mysql

Reply via email to