Yves!

OK. I agree I don't like this much myself, but we have to live with the multi-lingual aspect of UNICODE. Or rather, we have to agree to be either multi-lingual, and have the cons and pros of that (using UNICODE), or ignore UNICODE and have binary collations etc. And collation also determine equalness. real life example: I have a friend called called Widén, with an accented e. In Sweden, someone called Widen (with a non-accented e, and which is also a perfectly valid name) would sort and compare the same. I.e. in Sweden "Widén" = "Widen". That's just how it works. But the same names, which are binary different but the same using swedish language and swedish collations, would be different when using a french collation. I happen ti live on a street with a ringed and and an umlauted character in the name. When in the US, these two guys have their unlauts removed are are sorted as the umlauts weren't there. Which is OK in US. Which is not OK in sweden. In essence, string comparisons needs to and must use collations when using UNICODE data. You state that "Handel" is different than "Händel". I tend to agree with you, I am swedish by all means. But using a language collation where these characters don't exist just doesn't cut it. UNICODE collation determines not only sorting but also equality (i.e. "é" = "e" etc). Right or wrong, well I think that however you turn something will break. Frankly, I think a lot of blame here is on UNICODE to try to do too much, I'm not a big fan of this myself. But whichever way we do it, it will not be perfect. I think MySQL right now follows the UNICODE spec quite well, although there are still things missing. UNICODE is a reasonable compromise, and I see no better means of dealing with this. So even though I admit I'm no big fan of how UNICODE operates, I've still not figurted out a better way of delaing with it. And you are right of course, you may use the COLLATE keyword also, to enforce a certain collation, although if you want BINARY, I think using BINARY might be slightly more effective. What about a feature request to allow WHERE clauses to use a different collations than the one used for ORDER BY. So collation_connection controls the ORDER BY collation, and then I could say SET collation_connection_comparison = 'utf8_bin'. That would do what you want basically, and I think there might possibly be a need for this.

/Karlsson
Yves Goergen wrote:
On 03.03.2008 10:27 CE(S)T, Anders Karlsson wrote:
> [a lot about why sorting unicode is complicated]
If you want to accknowledge exact matching, and say any character, accented / unlauted etc, is different from any other character, specifiy a binary comparison:
SELECT * FROM phonebook WHERE BINARY name = 'Handel';

Hm, not quite compatible.

The solution I found is using this:

  SELECT * FROM table WHERE column = 'value' COLLATE ...;

But still there binary collation has a different name on MySQL and SQLite. PostgreSQL doesn't support the COLLATE clause, although part of the SQL-92 standard.

But you din't quite get my actual problem. You said that sorting Unicode things is complicated. I agree. I can live with a trade-off for sorting. But I cannot accept incorrect selection of records. When I want something that I can specify exactly, I only want to get that back, nothing else. The same counts for uniqueness constrains.

I've asked a freind who could test the matter with PostgreSQL. He said, it works exactly as expected. Sorting is unicode-like, selection is precise. Why can't MySQL do that, too? Is it so hard to distinguish sorting and selecting?



--
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /  Anders Karlsson ([EMAIL PROTECTED])
 / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Sales Engineer
/_/  /_/\_, /___/\___\_\___/ Stockholm
       <___/   www.mysql.com Cellphone: +46 708 608121
                              Skype: drdatabase



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to