Yves!
OK. I agree I don't like this much myself, but we have to live with
the multi-lingual aspect of UNICODE. Or rather, we have to agree to be
either multi-lingual, and have the cons and pros of that (using
UNICODE), or ignore UNICODE and have binary collations etc. And
collation also determine equalness. real life example: I have a friend
called called Widén, with an accented e. In Sweden, someone called Widen
(with a non-accented e, and which is also a perfectly valid name) would
sort and compare the same. I.e. in Sweden "Widén" = "Widen". That's just
how it works. But the same names, which are binary different but the
same using swedish language and swedish collations, would be different
when using a french collation.
I happen ti live on a street with a ringed and and an umlauted
character in the name. When in the US, these two guys have their unlauts
removed are are sorted as the umlauts weren't there. Which is OK in US.
Which is not OK in sweden.
In essence, string comparisons needs to and must use collations when
using UNICODE data. You state that "Handel" is different than "Händel".
I tend to agree with you, I am swedish by all means. But using a
language collation where these characters don't exist just doesn't cut
it. UNICODE collation determines not only sorting but also equality
(i.e. "é" = "e" etc). Right or wrong, well I think that however you turn
something will break.
Frankly, I think a lot of blame here is on UNICODE to try to do too
much, I'm not a big fan of this myself. But whichever way we do it, it
will not be perfect. I think MySQL right now follows the UNICODE spec
quite well, although there are still things missing. UNICODE is a
reasonable compromise, and I see no better means of dealing with this.
So even though I admit I'm no big fan of how UNICODE operates, I've
still not figurted out a better way of delaing with it.
And you are right of course, you may use the COLLATE keyword also,
to enforce a certain collation, although if you want BINARY, I think
using BINARY might be slightly more effective.
What about a feature request to allow WHERE clauses to use a
different collations than the one used for ORDER BY. So
collation_connection controls the ORDER BY collation, and then I could
say SET collation_connection_comparison = 'utf8_bin'. That would do what
you want basically, and I think there might possibly be a need for this.
/Karlsson
Yves Goergen wrote:
On 03.03.2008 10:27 CE(S)T, Anders Karlsson wrote:
> [a lot about why sorting unicode is complicated]
If you want to accknowledge exact matching, and say any character,
accented / unlauted etc, is different from any other character,
specifiy a binary comparison:
SELECT * FROM phonebook WHERE BINARY name = 'Handel';
Hm, not quite compatible.
The solution I found is using this:
SELECT * FROM table WHERE column = 'value' COLLATE ...;
But still there binary collation has a different name on MySQL and
SQLite. PostgreSQL doesn't support the COLLATE clause, although part
of the SQL-92 standard.
But you din't quite get my actual problem. You said that sorting
Unicode things is complicated. I agree. I can live with a trade-off
for sorting. But I cannot accept incorrect selection of records. When
I want something that I can specify exactly, I only want to get that
back, nothing else. The same counts for uniqueness constrains.
I've asked a freind who could test the matter with PostgreSQL. He
said, it works exactly as expected. Sorting is unicode-like, selection
is precise. Why can't MySQL do that, too? Is it so hard to distinguish
sorting and selecting?
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Anders Karlsson ([EMAIL PROTECTED])
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Sales Engineer
/_/ /_/\_, /___/\___\_\___/ Stockholm
<___/ www.mysql.com Cellphone: +46 708 608121
Skype: drdatabase
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]