Raymond Wan wrote: > Oh...I see -- thanks for this! > > Then I guess there are two combinations: Mediawiki with latin1 MySQL ; > Mediawiki with UTF MySQL. > What are the advantages / disadvantages of either choice? > > I *guess* that if someone were to login to mysql directly, and did a SELECT, > then the UTF would look > like gibberish. Likewise when a dump is done of the data. Of course, > neither "problem" affects > Mediawiki's functionality... > > Any other pros/cons? > > Thanks! > > Ray
MediaWiki offers you three character sets for MySQL: * MySQL 4.1/5.0 binary * MySQL 4.1/5.0 UTF-8 * MySQL 4.0 backwards-compatible UTF-8 In the three modes MediaWiki is storing utf-8 characters. It all depends on how MySQL treats them. In "backwards-compatible UTF-8" mysql thinks it's latin1. The data will "look wrong" and if you don't provide --default-character-set for mysqldump 4.1 and newer, it will corrupt the text (it will "helpfully" transform it to utf-8). This is the only one which works with mysql 4.0, and it supports the full unicode. UTF-8 uses MySQL support for UTF-8, which currentyl limits you to the Basic Multilingual Plane. The data will "look right". The indexes will be larger. With binary, it works almost like backwards utf-8, but mysql will treat it as opaque data and won't mess with it. Representation will be messy. You have the full unicode. _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
