Hi, Ok, what you describe sounds very close to the OpenType spec:
http://www.microsoft.com/typography/otfntdev/hangulot/ and what the ICU Layout Hangul shaper does. The one part I don't understand is the section "Compose Old Hangul Jamo combinations" under: http://www.microsoft.com/typography/otfntdev/hangulot/shaping.htm I can't make sense of that part, since Appendix B does not list what the jamos compose to. Please review those documents and share any insights you may have. I'll go ahead with implementing a shaper then. behdad On 13-04-06 01:32 PM, Dohyun Kim wrote: > 2013/4/6 Behdad Esfahbod <[email protected]>: >> On 13-04-05 06:45 AM, Dohyun Kim wrote: >>> 2013/4/5 Dohyun Kim <[email protected]>: >>>> Sorry for the noise. >>>> I have booted on Windows machine and tested uniscribe a bit. My guess >>>> on how uniscribe works on Hangul is: >>>> >>>> 1. decompose hangul syllables to jamos >>>> >>>> 2. compose single jamos to composite jamo as possible as can be >>>> eg., U+1100 U+1100 => U+1101 >>>> Note: mapping table for this composition is available at >>>> ftp://ktug.org/ktug/hcr-lvt/composejamotojamo.map >>>> >>> >>> Well, after a bit more test, it turned out that this second process is >>> not what uniscribe does. Sorry for my wrong information. I have >>> guessed this on the basis of old unicode standard. Recently unicode >>> also does not recommend to use multiple single jamos to get composite >>> jamo. >>> >>> Instead, uniscribe inserts fillers (U+115F U+1160) around single >>> lonely jamo which do not make up syllable block. >> >> Interesting. So, for a lone T jamo, both 115F and 1160 are inserted? > > Yes, when fillers are inserted. But actually uniscribe does not seem > to insert fillers. Sorry for my immuture conclusion. Today I have > downloaded harfbuzz win32 binary and tested some jamo texts using > hb-shape. This utility gave me more accurate information than I could > obtain with the naked eye. Contrary to my expectation, the output of > hb-shape did not have any traces of fillers. So, it seems evident > that uniscribe does not insert fillers. And it seems also evident > that uniscribe sets boundaries between syllable blocks, so that > multiple single jamos could not be concatenated to composite jamo. > > Let us suppose an input text <U+1100 U+AC00 U+11F0>. I guess what > uniscribe does: > > 1. decompose syllables to jamos: we get <U+1100 U+1100 U+1161 U+11F0> > > 2. demarcate each syllable block by setting boundaries in-between: we > get <U+1100 | U+1100 U+1161 U+11F0> where | means syllable boundary. > Probably this is related to the so-called "cluster." Yesterday I > misconceived this boundary (maybe ZWNJ but I am not sure) as a filler. > BTW, according to the old standard, U+1100 U+1100 are concatenated to > U+1101, so the result will be a single syllable block <U+1101 U+1161 > U+11F0>. Nowadays we do not need this jamo-to-jamo composition, > because all the jamos known until today are now registerd since > unicode version 5.2. > > 3. try to re-compose jamos to syllablle letter. But as our sample > text matches the case of <L V OT>, nothing is converted. > > 4. apply font features: we get <U+1100 | U+1100.s U+1161.s U+11F0.s> > where ".s" means sustituted glyph. > > As I said before, we Koreans do not input text like <U+AC00 U+11F0> in > their practice. However, there remains some possibility that some > applications or libaries do pass to harfbuzz some unicode-normailized > text, in which case hafbuzz would give us incorrect result. So I > changed my mind, and now I suggest an implementation of hangul shaper. > It is not an urgent task, though; harfbuzz works quite well already. > However, we want harfbuzz as perfect as possible. > > Regards, > > >>>> 3. compose jamos to hangul syllable as possible as can be >>>> Note: this process complies with KSC 1026-1. In other words, jamo >>>> sequence <L V> in <L V OT> is *not* converted to LV, where L means >>>> leading consonant, V means medial vowel, OT means *old* trailing >>>> consonant (U+11C3..U+11FF U+D7CB..U+D7FB), and LV means Hangul >>>> syllable equivalent to L V. >>>> >>>> 4. apply opentype layout features >>>> >>>> It is somewhat complicated but gives perfect result. It satisfies >>>> both the Korean and Unicode standards. Nevertheless, what current >>>> hafbuzz does is quite excellent as well and I am satisfied with it. I >>>> am reporting just for reference. >>>> > -- behdad http://behdad.org/ _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
