Re: Unicode sorting and binary comparison, please!

Anders Karlsson Mon, 03 Mar 2008 14:18:26 -0800

Yves!

OK. I agree I don't like this much myself, but we have to live withthe multi-lingual aspect of UNICODE. Or rather, we have to agree to beeither multi-lingual, and have the cons and pros of that (usingUNICODE), or ignore UNICODE and have binary collations etc. Andcollation also determine equalness. real life example: I have a friendcalled called Widén, with an accented e. In Sweden, someone called Widen(with a non-accented e, and which is also a perfectly valid name) wouldsort and compare the same. I.e. in Sweden "Widén" = "Widen". That's justhow it works. But the same names, which are binary different but thesame using swedish language and swedish collations, would be differentwhen using a french collation.I happen ti live on a street with a ringed and and an umlautedcharacter in the name. When in the US, these two guys have their unlautsremoved are are sorted as the umlauts weren't there. Which is OK in US.Which is not OK in sweden.In essence, string comparisons needs to and must use collations whenusing UNICODE data. You state that "Handel" is different than "Händel".I tend to agree with you, I am swedish by all means. But using alanguage collation where these characters don't exist just doesn't cutit. UNICODE collation determines not only sorting but also equality(i.e. "é" = "e" etc). Right or wrong, well I think that however you turnsomething will break.Frankly, I think a lot of blame here is on UNICODE to try to do toomuch, I'm not a big fan of this myself. But whichever way we do it, itwill not be perfect. I think MySQL right now follows the UNICODE specquite well, although there are still things missing. UNICODE is areasonable compromise, and I see no better means of dealing with this.So even though I admit I'm no big fan of how UNICODE operates, I'vestill not figurted out a better way of delaing with it.And you are right of course, you may use the COLLATE keyword also,to enforce a certain collation, although if you want BINARY, I thinkusing BINARY might be slightly more effective.What about a feature request to allow WHERE clauses to use adifferent collations than the one used for ORDER BY. Socollation_connection controls the ORDER BY collation, and then I couldsay SET collation_connection_comparison = 'utf8_bin'. That would do whatyou want basically, and I think there might possibly be a need for this.


/Karlsson
Yves Goergen wrote:

On 03.03.2008 10:27 CE(S)T, Anders Karlsson wrote:
> [a lot about why sorting unicode is complicated]
If you want to accknowledge exact matching, and say any character,accented / unlauted etc, is different from any other character,specifiy a binary comparison:
SELECT * FROM phonebook WHERE BINARY name = 'Handel';
Hm, not quite compatible.

The solution I found is using this:

  SELECT * FROM table WHERE column = 'value' COLLATE ...;
But still there binary collation has a different name on MySQL andSQLite. PostgreSQL doesn't support the COLLATE clause, although partof the SQL-92 standard.
But you din't quite get my actual problem. You said that sortingUnicode things is complicated. I agree. I can live with a trade-offfor sorting. But I cannot accept incorrect selection of records. WhenI want something that I can specify exactly, I only want to get thatback, nothing else. The same counts for uniqueness constrains.
I've asked a freind who could test the matter with PostgreSQL. Hesaid, it works exactly as expected. Sorting is unicode-like, selectionis precise. Why can't MySQL do that, too? Is it so hard to distinguishsorting and selecting?



--
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /  Anders Karlsson ([EMAIL PROTECTED])
 / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Sales Engineer
/_/  /_/\_, /___/\___\_\___/ Stockholm
       <___/   www.mysql.com Cellphone: +46 708 608121
                              Skype: drdatabase



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Unicode sorting and binary comparison, please!

Reply via email to