It remains an amazement to me how people still rely on binary data sort for
something like character data in B-Trees. How do you do a case blind search
for 'abc'? Our database offers efficient and arbitrary collation order for
data being extracted from it. Why would someone want Unicode binary order on
their data if they are extracting the data in Shift-JIS encoding? If you
want to hide the internal encoding used by your database, then hide it. I
entirely reject your assertion about database applications.

                                *

-----Original Message-----
From: Markus Kuhn [mailto:[EMAIL PROTECTED]
Sent: Friday, June 13, 2003 7:19 AM
...
UTF-16 remains an ugly misscarriage, because by placing
the surrogates not at the end of the 16-bit space but into the middle of
the code range, it leads to an incompatible binary sorting order in
B-trees with UCS-4 and UTF-8 and therefore is useless for database
applications that want to hide the internal encoding from the user of
B-tree iterators.
...
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to