The UTC recently agreed to clarify that the syllable structure is more general, along the lines that Kent is describing. I'll add more info when I have the time.
Mark ————— Ὀλίγοι ἔμφονες πολλῶν ἀφρόνων φοβερώτεροι — � λάτωνος [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Kent Karlsson" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: "Jungshik Shin" <[EMAIL PROTECTED]> Sent: Tuesday, November 13, 2001 05:04 Subject: Re: Hangul and IDN (was Re: [idn] reordering strawpoll) > Hi! > > >> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent > >> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine > >> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161> > >> (G, G, A), > > >> but this equivalence is neither a canonical equivalence, as it > >> should have been, nor a compatibility equivalence. Still, the latter letter > >> sequence represents EXACTLY the same syllable as the two earlier character > >> sequences, and a proper rendering engine (of which there are already some, > >> I'm told) would correctly render the three sequences in the same way. > > >Are you raising this possiblity: U+1101 <---> U+1100 U+1000 (GG <-> G G) ? > >Design of conjoining (cluster) jamos treat two choseong sequence > >U+1100 U+1000 as illegal sequence (syllable break condition). > > (You mean <U+1100, U+1100>.) No, that is not a syllable break condition. > See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below). > > (B.t.w. apart from algorithmic decomposition (per se) of Hangul syllable characters, > the cluster Jamos are not needed, and should ideally not be used. That is not > spelled out in TUS 3.0, though...) > > > That is described somewhere in Unicode 3, >chapter 3,section 11. That will help you. > > Please reread that section carefully! (Link below.) In particular page 53. > > >if some rendering engine display the two as the same syllable, > >I suspect the product is buggy or beyond the standard. :-) > > No, it's not beyond the Unicode standard at all. See page 53 of TUS 3.0, > http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my emphasis): > "A standard syllable block is composed of a *sequence* of choseong followed > by a *sequence* of jungseong and optionally a *sequence* of jongseong." > > That that description is only about NFD form is not spelled out, nor > is the fact that combining characters, in particular a Hangul tone mark, > may follow (logically they apply to the entire syllable!). Whether to consider > the combining characters as part of the syllable or not, I think is a matter > of taste. If we also take combining characters into account, but still NFD, > the syntax for a Hangul syllable is: > > Hangul-syllable-NFD ::= C+ V+ F* T* > > where C is a choseong, a V is a jungseong (vowel), F is a jongseong, and T is a > combining character (like a Hangul tone mark). (I'll ignore the FILLER issues > for the moment, including their automatic insertion.) > > This is needed to be able to spell historic (and future) Hangul texts that may use > consonant or vowel clusters that are not given a character of their own. > > Taking into account what NFC may cause, the full Hangul syllable syntax is: > > Hangul-syllable ::= > C* CVsyllable V* F* T* > | C* CVFsyllable F* T* > | C+ V+ F* T* > > where CVsyllable is a consonants-vowels syllable character, and CVFsyllable is a > consonants-vowels-consonants syllable character. > > >Would you tell me the version of the rendering engine ? > > I'm told(!) this is (properly!) implemented at least in Windows XP and > IE 6 (maybe also IE 5.5). Perhaps also elsewhere (I'm not keeping track). > > Kind regards > /kent k > > > >
