I've now pushed a Hangul shaper out to HarfBuzz master. Here's the comments explaining what it tries to do:
/* Hangul syllables come in two shapes: LV, and LVT. Of those: * * - LV can be precomposed, or decomposed. Lets call those * <LV> and <L,V>, * - LVT can be fully precomposed, partically precomposed, or * fully decomposed. Ie. <LVT>, <LV,T>, or <L,V,T>. * * The composition / decomposition is mechanical. However, not * all <L,V> sequences compose, and not all <LV,T> sequences * compose. * * Here are the specifics: * * - <L>: U+1100..115F, U+A960..A97F * - <V>: U+1160..11A7, U+D7B0..D7C7 * - <T>: U+11A8..11FF, U+D7C8..D7FF * * - Only the <L,V> sequences for the 11xx ranges combine. * - Only <LV,T> sequences for T in U+11A8..11C3 combine. * * Here is what we want to accomplish in this shaper: * * - If the whole syllable can be precomposed, do that, * - Otherwise, fully decompose. * * That is, of the different possible syllables: * * <L> * <L,V> * <L,V,T> * <LV> * <LVT> * <LV, T> * * - <L> needs no work. * * - <LV> and <LVT> can stay the way they are if the font supports them, otherwise we * should fully decompose them if font supports. * * - <L,V> and <L,V,T> we should compose if the whole thing can be composed. * * - <LV,T> we should compose if the whole thing can be composed, otherwise we should * decompose. */ Please test. behdad On 13-04-18 09:44 AM, Dohyun Kim wrote: > 2013/4/18 Dohyun Kim <[email protected]>: >> 2013/4/18 Behdad Esfahbod <[email protected]>: >>> When are the OpenType features applied, after all those processes are done? >> >> If possible, please apply "ccmp" feature before all those processes. > > On a second thought, now I think it is more efficient and compliant to > the unicode standard to apply "ccmp" feature after decomposition of > hangul syllables and before setting syllable boundaries. > >> And "*jmo" features after all those processes. >> >>> Are the '*jmo' features applied to all glyphs? >> >> No. Only to those well-formed syllable block <M? L V T?>. >> >>> >>> On 13-04-16 11:29 PM, Dohyun Kim wrote: >>>> http://ktug.org/~nomos/harfbuzz-hangul/hangulshaper.pdf >>>> >>>> Regards, >>>> >>>> 2013/4/17 Behdad Esfahbod <[email protected]>: >>>>> Ok, given how confusing this thread has become, please create a Google >>>>> Doc, >>>>> and write down what you think the HarfBuzz Hangul shaper should do. >>>>> Modify it >>>>> as much as you want, but keep it as short as possible. Please make the >>>>> doc >>>>> commentable by the public, and send the link here. >>>>> >>>>> Thanks, >>>>> behdad >>>>> >>>>> On 13-04-16 10:10 AM, Dohyun Kim wrote: >>>>>> 2013/4/16 Dohyun Kim <[email protected]>: >>>>>>> 2013/4/15 Dohyun Kim <[email protected]>: >>>>>>>> >>>>>>>> The behavior of new Uniscribe is quote confusing and seems to be >>>>>>>> inconsistant on some points. I cannot describe concisely what it >>>>>>>> does. But it is evident that it renders correctly only those input >>>>>>>> sequence which is compliant to KS X 1026-1. >>>>>>>> >>>>>>> >>>>>>> OK. My guess about the behavior of new Uniscribe: >>>>>>> >>>>>>> 1. demarcate syllable blocks according to KS X 1026-1 >>>>>>> >>>>>>> - between L and L, V and V, T and T, or L and T (these are illegal >>>>>>> string) >>>>>>> - between V and L, T and L, or M and L (these are legal break point) >>>>>>> - between Jamo and non-Jamo character including Hangul syllables >>>>>>> - but not between L and V, V and T, T and M, V and M, LVT and M, LV >>>>>>> and M. >>>>>> >>>>>> Oh, I have left out one stunning thing. I really dislike this sort of >>>>>> behavior: >>>>>> >>>>>> - The Jamo sequence of <L V T> is divided into <L | V | T>, if >>>>>> equivalent <LVT> syllable exists. >>>>>> - Likewise, <L V> sequence is divided into <L | V>, if it is not >>>>>> followed by T and equivalent <LV> syllable exists. >>>>>> >>>>>>> >>>>>>> where LVT and LV are Hangul syllables; L, V, and T are Jamos; M means >>>>>>> Hangul tone marks (U+302E or U+302F) >>>>>>> >>>>>>> 2. reorder Hangul tone marks >>>>>>> >>>>>>> - if syllable block is well-formed, move M from the last to the >>>>>>> first of the cluster. >>>>>>> - if syllable is not well-formed, Uniscribe does not move M. >>>>>>> Instead, U+25CC is inserted after M. >>>>>>> >>>>>>> where "well-formed" means <LVT>, <LV>, <L V T>, or <L V>. >>>>>>> >>>>>> >>>>>> -- >>>>>> Dohyun Kim >>>>>> College of Law, Dongguk University >>>>>> Seoul, Republic of Korea >>>>>> >>>>> >>>>> -- >>>>> behdad >>>>> http://behdad.org/ >>>> >>>> >>>> >>> >>> -- >>> behdad >>> http://behdad.org/ >> >> >> >> -- >> Dohyun Kim >> College of Law, Dongguk University >> Seoul, Republic of Korea > > > > -- > Dohyun Kim > College of Law, Dongguk University > Seoul, Republic of Korea > -- behdad http://behdad.org/ _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
