Raymond Wan wrote:
> Oh...I see -- thanks for this!
>
> Then I guess there are two combinations:  Mediawiki with latin1 MySQL ; 
> Mediawiki with UTF MySQL.
> What are the advantages / disadvantages of either choice?
>
> I *guess* that if someone were to login to mysql directly, and did a SELECT, 
> then the UTF would look
> like gibberish.  Likewise when a dump is done of the data.  Of course, 
> neither "problem" affects
> Mediawiki's functionality...
>
> Any other pros/cons?
>
> Thanks!
>
> Ray

MediaWiki offers you three character sets for MySQL:
  * MySQL 4.1/5.0 binary
  * MySQL 4.1/5.0 UTF-8
  * MySQL 4.0 backwards-compatible UTF-8

In the three modes MediaWiki is storing utf-8 characters. It all depends 
on how MySQL treats them.

In "backwards-compatible UTF-8" mysql thinks it's latin1. The data  will 
"look wrong" and if you don't provide --default-character-set for
mysqldump 4.1 and newer, it will corrupt the text (it will "helpfully" 
transform it to utf-8). This is the only one which works with mysql 4.0, 
and it supports the full unicode.

UTF-8 uses MySQL support for UTF-8, which currentyl limits you to the 
Basic Multilingual Plane. The data will "look right". The indexes will 
be larger.

With binary, it works almost like backwards utf-8, but mysql will treat 
it as opaque data and won't mess with it. Representation will be messy. 
You have the full unicode.


_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to