On Sat, May 13, 2006 at 08:41:33PM +0100, Martin J. Evans wrote:
> Hi,
>
> This is really a progress report - some of it I've already sent to Patrick.
>
> Attached is a patch to DBD::mysql 3.0003_1 which:
>
> o fixes a few compile issues with were caused by declarations
> in the middle of a block - already sent to Patrick.
>
> o fixes the memory corruption I previously reported on dbi-users.
> Sent to Patrick but the initial fix had problems - I've resent
> the changes to Patrick.
>
> o introduces a small change to get utf8 data back
> All this does is check charsetnr to see if it is 33 and
> turns on the utf8 flag if it is.
> You still need to do one or both of:
>
> $dbh->do("set character set utf8");
> $dbh->do("set names utf8");
>
> to get utf8 back and even then you only get it back if the
> column is defined as utf8 in mysql.
That's not ideal. It should be possible to set the 'connection charset'
to utf8 so all charsets are auto-converted to utf8 my mysql. That would
be *much* more useful.
> I'm not clear exactly which of the above 2 settings you need to make
> as there is contradictory evidence on the net about this.
This needs to be cleared up and the effect of each documented.
> I've tested this a little - so far as my utf8 defined tables
> which contain utf8 data come back fine and none uf8 columns
> are untouched.
>
> If anyone else fancies testing this or helping with information
> about what you need to set (character set or names) I'd be pleased
> to hear about it. My primary interest is fairly selfish in that I
> need utf8 support but I'm happy to share what I find/change but I
> also don't want to step on Patrick's shoes - as such this is cc'ed to
> Patrick (and I also appologise for the first attempt I sent
> you for fixing the memory corruption - my mistake - I sent an
> out of date version).
>
> I should probably also point out that this patch is against
> a development release of dbd::mysql and all that implies.
>
> There was also a mail from Henri on dbi-users about using TYPE=>SQL_INTEGER
> in bind_param call - I don't do this and can't (currently from the existing
> code) see how it would work.
>
> As an aside, if anyone knows a definite list of charsetnr values
> in mysql I would be interested in it as my patch is based on
> what I've seen rather than what I've read anywhere.
Patrick (or someone), please file a doc bug with mysql to get them to
document the values of charsetnr.
Thanks Martin.
Tim.
> I did try
> adding get_charset calls to dbdimp.c for DBD::mysql to find
> the charset name but to do this I needed to add a load of #includes
> to dbdimp.h which caused a lot of compiler warnings. This was
> based on code I found at http://bugs.mysql.com/bug.php?id=6911.
>
> Martin
> --
> Martin J. Evans
> Easysoft Ltd, UK
> http://www.easysoft.com
>
>
> On 04-May-2006 Martin J. Evans wrote:
> > Tim,
> >
> > On 04-May-2006 Tim Bunce wrote:
> >> On Sun, Apr 30, 2006 at 01:36:04PM -0700, Patrick Galbraith wrote:
> >>> Martin J. Evans wrote:
> >>>
> >>> Martin,
> >>>
> >>> Thanks much! This is dbdimp.c, right? I will add this tomorrow (not
> >>> working today), and test it out.
> >
> >
> >> Please don't use only is_high_bit_set() to enable UTF8. That'll break
> >> any code that is storing non-utf8 data that happens to have the high-bit
> >> set.
> >>
> >> Please make sure the test cases cover this situation. It's not enough
> >> to get 'utf8 working' its also important to not break existing code.
> >>
> >> Using the 'charsetnr' value (see below) looks far more correct. That way
> >> perl will treat the values as UTF8 only if mysql was treating it as UTF8.
> >
> > Sorry, I should have made it clearer it was only a demonstration that utf8
> > can
> > work with mysql as someone had been asking that. I had already told Patrick
> > that off the list. I fully realised that hack would break 8 bit chrsets.
> >
> > I have already started looking at charsetnr but have run into a number of
> > issues due to the way charsetnr has changed over different versions of
> > mysql.
> >
> > Martin
> > --
> > Martin J. Evans
> > Easysoft Ltd, UK
> > http://www.easysoft.com
> >
> >
> >>> >>>The keys mysql docs seem to be
> >>> >>>http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
> >>> >>>
> >>> >>>The mysql api and client->server protocol doesn't support passing
> >>> >>>characterset info to the server on a per-statement / per-bind value
> >>> >>>basis.
> >>> >>>(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatype
> >>> >>>s
> >>> >>>.html)
> >>> >>>So the sane way to send utf8 to the server is by setting the
> >>> >>>'connection
> >>> >>>character set' to utf8 and then only sending utf8 (or its ASCII subset)
> >>> >>>to the server on that connection.
> >>> >>>
> >>> >>>*** Fetching data:
> >>> >>>
> >>> >>>MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD
> >>> >>>structure.
> >>> >>>It's the "character set number for the field".
> >>> >>>
> >>> >>>So set the UTF8 flag based on that value. Something like:
> >>> >>> (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv);
> >>> >>>I couldn't see any docs for the values of the charsetnr field.
> >>> >>>
> >>> >>>Also, would be good to enable perl code to access the charsetnr values:
> >>> >>> $sth->{mysql_charsetnr}->[$i]
> >>> >>>
> >>> >>>*** Fetching Metadata:
> >>> >>>
> >>> >>>The above is a minimum. It doesn't address metadata like field names
> >>> >>>($sth->{NAME}) that might also be in utf8. For that the driver needs to
> >>> >>>know if the 'connection character set' is currently utf8.
> >>> >>>
> >>> >>>(The docs mention mysql->charset but it's not clear if that's part of
> >>> >>>the public API.)
> >>> >>>
> >>> >>>However it's detected, the code needs to end up doing:
> >>> >>> (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv);
> >>> >>>on the metadata.
> >>> >>>
> >>> >>>
> >>> >>>*** SET NAMES '...'
> >>> >>>
> >>> >>>Intercept SET NAMES and call the mysql_set_character_set() API instead.
> >>> >>>See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html
> >>> >>>
> >>> >>>
> >>> >>>*** Detecting Inconsistencies
> >>> >>>
> >>> >>>If the connection character set is _not_ utf8 but the application calls
> >>> >>>the driver with data (or SQL statement) that has the UTF8 flag set,
> >>> >>>then
> >>> >>>it could issue a warning. In practice that may be to be too noisy for
> >>> >>>people that done their own workarounds for utf8 support. If so then
> >>> >>>they could be changes to level 1 trace messages.
> >>> >>>
> >>> >>>If the connection character set _is_ utf8, and the application calls
> >>> >>>the driver with data (or SQL statement) that does _not_ have the UTF8
> >>> >>>flag set but _does_ have bytes with the high bit set, then the driver
> >>> >>>should issue a warning. The checking for high bit set is an extra cost
> >>> >>>so this should only be enabled if tracing and/or an attribute is set
> >>> >>>(perhaps called $dbh->{mysql_charset_checks} = 1)
> >>> >>>
> >>> >>>Tim.
> >>> >>>
> >>> >>>
> >>>