Tim, On 04-May-2006 Tim Bunce wrote: > On Sun, Apr 30, 2006 at 01:36:04PM -0700, Patrick Galbraith wrote: >> Martin J. Evans wrote: >> >> Martin, >> >> Thanks much! This is dbdimp.c, right? I will add this tomorrow (not >> working today), and test it out.
> Please don't use only is_high_bit_set() to enable UTF8. That'll break > any code that is storing non-utf8 data that happens to have the high-bit set. > > Please make sure the test cases cover this situation. It's not enough > to get 'utf8 working' its also important to not break existing code. > > Using the 'charsetnr' value (see below) looks far more correct. That way > perl will treat the values as UTF8 only if mysql was treating it as UTF8. Sorry, I should have made it clearer it was only a demonstration that utf8 can work with mysql as someone had been asking that. I had already told Patrick that off the list. I fully realised that hack would break 8 bit chrsets. I have already started looking at charsetnr but have run into a number of issues due to the way charsetnr has changed over different versions of mysql. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com >> >>>The keys mysql docs seem to be >> >>>http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html >> >>> >> >>>The mysql api and client->server protocol doesn't support passing >> >>>characterset info to the server on a per-statement / per-bind value >> >>>basis. >> >>>(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes >> >>>.html) >> >>>So the sane way to send utf8 to the server is by setting the 'connection >> >>>character set' to utf8 and then only sending utf8 (or its ASCII subset) >> >>>to the server on that connection. >> >>> >> >>>*** Fetching data: >> >>> >> >>>MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD structure. >> >>>It's the "character set number for the field". >> >>> >> >>>So set the UTF8 flag based on that value. Something like: >> >>> (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); >> >>>I couldn't see any docs for the values of the charsetnr field. >> >>> >> >>>Also, would be good to enable perl code to access the charsetnr values: >> >>> $sth->{mysql_charsetnr}->[$i] >> >>> >> >>>*** Fetching Metadata: >> >>> >> >>>The above is a minimum. It doesn't address metadata like field names >> >>>($sth->{NAME}) that might also be in utf8. For that the driver needs to >> >>>know if the 'connection character set' is currently utf8. >> >>> >> >>>(The docs mention mysql->charset but it's not clear if that's part of >> >>>the public API.) >> >>> >> >>>However it's detected, the code needs to end up doing: >> >>> (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); >> >>>on the metadata. >> >>> >> >>> >> >>>*** SET NAMES '...' >> >>> >> >>>Intercept SET NAMES and call the mysql_set_character_set() API instead. >> >>>See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html >> >>> >> >>> >> >>>*** Detecting Inconsistencies >> >>> >> >>>If the connection character set is _not_ utf8 but the application calls >> >>>the driver with data (or SQL statement) that has the UTF8 flag set, then >> >>>it could issue a warning. In practice that may be to be too noisy for >> >>>people that done their own workarounds for utf8 support. If so then >> >>>they could be changes to level 1 trace messages. >> >>> >> >>>If the connection character set _is_ utf8, and the application calls >> >>>the driver with data (or SQL statement) that does _not_ have the UTF8 >> >>>flag set but _does_ have bytes with the high bit set, then the driver >> >>>should issue a warning. The checking for high bit set is an extra cost >> >>>so this should only be enabled if tracing and/or an attribute is set >> >>>(perhaps called $dbh->{mysql_charset_checks} = 1) >> >>> >> >>>Tim. >> >>> >> >>> >>
