I was asked to explain the "what's that on your screen?
.̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̸̨̨̨̨̨̨̨̨̨̨̨̨.̸̸̨̨" Twitter meme that
originated with
http://twitter.com/dailydylann/statuses/63228871759237120. When I
tried to insert the Unicode character combination into my post,
however, Habari failed on me and returned a bunch of question marks in
place of the UTF-8 I put in.

I tracked the problem down to the MySQL Connection Adapter: It only
calls 'SET CHARACTER SET UTF8' (or whatever your MYSQL_CHAR_SET is set
to; default is UTF8), but not 'SET NAMES UTF8'. According to a source
code comment, "SET CHARACTER SET covers all the values included in SET
NAMES, as per http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html";.
However, that is not true, as outlined (albeit not very clearly) on
the very page linked above:

== Quote from the page ==

A `SET NAMES 'x'` statement is equivalent to these three statements:

   SET character_set_client = x;
   SET character_set_results = x;
   SET character_set_connection = x;

[...]

A `SET CHARACTER SET x` statement is equivalent to these three statements:

   SET character_set_client = x;
   SET character_set_results = x;
   SET collation_connection = @@collation_database;

Setting collation_connection also sets character_set_connection to the
character set associated with the collation (equivalent to executing
SET character_set_connection = @@character_set_database). It is not
necessary to set character_set_connection explicitly.

== End quote ==

The important difference is that SET CHARACTER SET sets
collation_connection to the collation of the selected database, and
then in turn uses that value to set character_set_connection. Thus,
the two statements are equivalent ONLY if the collation of the
database Habari uses is utf8_*. If, however, your database collation
is, for example, the old MySQL default of latin1_swedish_ci, then:

SET NAMES UTF8;

will set your character_set_connection to UTF8, whereas

SET CHARACTER SET UTF8;

will set your character_set_connection to latin1!

I thus amended system/schema/mysql/connection.php with:

                $this->exec('SET NAMES ' . MYSQL_CHAR_SET);

after the "SET CHARACTER SET" line and removed the misleading comment.
I hope this fixes the UTF-8 issues once and for all.

Regards,
Matt

-- 
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/habari-dev

Reply via email to