On Sat, May 13, 2006 at 08:41:33PM +0100, Martin J. Evans wrote:
> Hi,
> 
> This is really a progress report - some of it I've already sent to Patrick.
> 
> Attached is a patch to DBD::mysql 3.0003_1 which:
> 
> o fixes a few compile issues with were caused by declarations
>   in the middle of a block - already sent to Patrick.
> 
> o fixes the memory corruption I previously reported on dbi-users.
>   Sent to Patrick but the initial fix had problems - I've resent
>   the changes to Patrick.
> 
> o introduces a small change to get utf8 data back
>   All this does is check charsetnr to see if it is 33 and
>   turns on the utf8 flag if it is.
>   You still need to do one or both of:
> 
> $dbh->do("set character set utf8");
> $dbh->do("set names utf8");
> 
>   to get utf8 back and even then you only get it back if the
>   column is defined as utf8 in mysql.

That's not ideal. It should be possible to set the 'connection charset'
to utf8 so all charsets are auto-converted to utf8 my mysql. That would
be *much* more useful.

>   I'm not clear exactly which of the above 2 settings you need to make
>   as there is contradictory evidence on the net about this.

This needs to be cleared up and the effect of each documented.

>   I've tested this a little - so far as my utf8 defined tables
>   which contain utf8 data come back fine and none uf8 columns
>   are untouched.
> 
> If anyone else fancies testing this or helping with information
> about what you need to set (character set or names) I'd be pleased
> to hear about it. My primary interest is fairly selfish in that I
> need utf8 support but I'm happy to share what I find/change but I
> also don't want to step on Patrick's shoes - as such this is cc'ed to
> Patrick (and I also appologise for the first attempt I sent
> you for fixing the memory corruption - my mistake - I sent an
> out of date version).
> 
> I should probably also point out that this patch is against
> a development release of dbd::mysql and all that implies.
> 
> There was also a mail from Henri on dbi-users about using TYPE=>SQL_INTEGER
> in bind_param call - I don't do this and can't (currently from the existing
> code) see how it would work.
> 
> As an aside, if anyone knows a definite list of charsetnr values
> in mysql I would be interested in it as my patch is based on
> what I've seen rather than what I've read anywhere.

Patrick (or someone), please file a doc bug with mysql to get them to
document the values of charsetnr.

Thanks Martin.

Tim.

> I did try
> adding get_charset calls to dbdimp.c for DBD::mysql to find
> the charset name but to do this I needed to add a load of #includes
> to dbdimp.h which caused a lot of compiler warnings. This was
> based on code I found at http://bugs.mysql.com/bug.php?id=6911.
> 
> Martin
> --
> Martin J. Evans
> Easysoft Ltd, UK
> http://www.easysoft.com
> 
> 
> On 04-May-2006 Martin J. Evans wrote:
> > Tim,
> > 
> > On 04-May-2006 Tim Bunce wrote:
> >> On Sun, Apr 30, 2006 at 01:36:04PM -0700, Patrick Galbraith wrote:
> >>> Martin J. Evans wrote:
> >>> 
> >>> Martin,
> >>> 
> >>> Thanks much! This is dbdimp.c, right? I  will add this tomorrow (not 
> >>> working today), and test it out.
> > 
> >  
> >> Please don't use only is_high_bit_set() to enable UTF8.  That'll break
> >> any code that is storing non-utf8 data that happens to have the high-bit
> >> set.
> >> 
> >> Please make sure the test cases cover this situation. It's not enough
> >> to get 'utf8 working' its also important to not break existing code.
> >> 
> >> Using the 'charsetnr' value (see below) looks far more correct. That way
> >> perl will treat the values as UTF8 only if mysql was treating it as UTF8.
> > 
> > Sorry, I should have made it clearer it was only a demonstration that utf8
> > can
> > work with mysql as someone had been asking that. I had already told Patrick
> > that off the list. I fully realised that hack would break 8 bit chrsets.
> > 
> > I have already started looking at charsetnr but have run into a number of
> > issues due to the way charsetnr has changed over different versions of 
> > mysql.
> > 
> > Martin
> > --
> > Martin J. Evans
> > Easysoft Ltd, UK
> > http://www.easysoft.com
> > 
> > 
> >>> >>>The keys mysql docs seem to be
> >>> >>>http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
> >>> >>>
> >>> >>>The mysql api and client->server protocol doesn't support passing
> >>> >>>characterset info to the server on a per-statement / per-bind value 
> >>> >>>basis.
> >>> >>>(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatype
> >>> >>>s
> >>> >>>.html)
> >>> >>>So the sane way to send utf8 to the server is by setting the 
> >>> >>>'connection
> >>> >>>character set' to utf8 and then only sending utf8 (or its ASCII subset)
> >>> >>>to the server on that connection.
> >>> >>>
> >>> >>>*** Fetching data:
> >>> >>>
> >>> >>>MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD 
> >>> >>>structure.
> >>> >>>It's the "character set number for the field".
> >>> >>>
> >>> >>>So set the UTF8 flag based on that value. Something like:
> >>> >>>   (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv);
> >>> >>>I couldn't see any docs for the values of the charsetnr field.
> >>> >>>
> >>> >>>Also, would be good to enable perl code to access the charsetnr values:
> >>> >>>   $sth->{mysql_charsetnr}->[$i]
> >>> >>>
> >>> >>>*** Fetching Metadata:
> >>> >>>
> >>> >>>The above is a minimum. It doesn't address metadata like field names
> >>> >>>($sth->{NAME}) that might also be in utf8. For that the driver needs to
> >>> >>>know if the 'connection character set' is currently utf8.
> >>> >>>
> >>> >>>(The docs mention mysql->charset but it's not clear if that's part of
> >>> >>>the public API.)
> >>> >>>
> >>> >>>However it's detected, the code needs to end up doing:
> >>> >>>   (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv);
> >>> >>>on the metadata.
> >>> >>>
> >>> >>>
> >>> >>>*** SET NAMES '...'
> >>> >>>
> >>> >>>Intercept SET NAMES and call the mysql_set_character_set() API instead.
> >>> >>>See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html
> >>> >>>
> >>> >>>
> >>> >>>*** Detecting Inconsistencies
> >>> >>>
> >>> >>>If the connection character set is _not_ utf8 but the application calls
> >>> >>>the driver with data (or SQL statement) that has the UTF8 flag set, 
> >>> >>>then
> >>> >>>it could issue a warning. In practice that may be to be too noisy for
> >>> >>>people that done their own workarounds for utf8 support. If so then
> >>> >>>they could be changes to level 1 trace messages.
> >>> >>>
> >>> >>>If the connection character set _is_ utf8, and the application calls
> >>> >>>the driver with data (or SQL statement) that does _not_ have the UTF8
> >>> >>>flag set but _does_ have bytes with the high bit set, then the driver
> >>> >>>should issue a warning. The checking for high bit set is an extra cost
> >>> >>>so this should only be enabled if tracing and/or an attribute is set
> >>> >>>(perhaps called $dbh->{mysql_charset_checks} = 1)
> >>> >>>
> >>> >>>Tim.
> >>> >>>     
> >>> >>>
> >>> 


Reply via email to