... > The working group co-chairs would like to conduct a strawpoll to > guage the consensus of the working group on reordering > http://www.ietf.org/internet-drafts/draft-ietf-idn-lsb-ace-02.txt
>From http://www.ietf.org/internet-drafts/draft-ietf-idn-lsb-ace-02.txt: > As such examples shows, most ACE algorithms are designed to favor > latin and small script blocks over very large blocks like han and > hangeul. For CJK people (in China,Hongkong,Macao,Japan,South/North > Korea and Taiwan), that disadvantage results in longer ACE labels > and less room for free-form long names. It is clear that there > must be some improvements to ACEs to compensate this unfair > disadvantage. It's a bit strange that this comes from quarters where there is already quite a lot of "compaction" in the representations of text. A single Han ideograph expresses "more" than a single letter in other scripts. And a single Hangul syllable character expresses from 2 to 6 letters in one single character. Hangul is fundamentally is an alphabetic script, with 17 consonant letters and 11 vowel letters, plus some variant (and historic) letters. <ironic-mode> I think it's an urgent matter to follow the example of Hangul for all alphabetic scripts and encode tens of thousands of syllables in the Latin, Greek, etc. scripts, so that we can compensate this unfair disadvantage that Latin, etc. have compared to Hangul (and Han, though that is not alphabetic). And then, of course, we have to have a reordering stage, where the most common syllables are ordered so that also an ACE encoded string gets as short as possible. </ironic-mode> Of course, which languages should be considered for selection for encoding of Latin or Cyrillic syllables, and which languages's statistics should be used for the reordering will be hotly debated! ;-) On the ironic side /kent k
