ICU 2.0 Collation charts online

Markus Scherer Fri, 02 Nov 2001 17:04:53 -0800

Dear ICU users,

We have generated graphical charts that show the sorting order for many locales with 
the ICU 2.0 data: http://oss.software.ibm.com/icu/charts/collation/
They are intended to give an easier-to-read overview of the sorting order than the 
source data (which lives in CVS, in the locale-specific icu/data/*.txt files).


Please take a look at the charts and notify us of any problems, either via email, or, 
if you are sure that something is wrong, by filing a bug . Please see our Contacts 
page on http://oss.software.ibm.com/icu/archives/index.html

Please note the following:

- Currently, many more characters are shown in each chart than are actually used in 
each language. This is because we show entire scripts with all variations. In the 
future, we will need to collect lists of characters that are actually used in a 
language in order to show simpler charts.
However, with the complete script charts, you may be able to see peculiarities that 
might be unintended.

- You need to look at the actual collation weights (fly-over text) for the actual 
sorting of characters that expand (red coloring). For example, a sharp s (ß) sorts 
like ss but is shown as primary different from s (just like ss itself is different 
from s). We do not currently have code for the chart generation that automatically 
finds that ß is similar to ss and would show a lower-level difference between those.

- All of the collation sequences are based on the Unicode Collation Algorithm table 
for "sorting everything". This means that many characters of the particular language 
and all of the characters of other languages follow the UCA order. We have a link to 
the UCA charts on unicode.org.


Enjoy, and thank you very much for your help,

markus

ICU 2.0 Collation charts online

Reply via email to