On 20/1/14 02:21, Roozbeh Pournader wrote:
Jonathan,

I was wondering if the new patches would have all the canonically
equivalent characters sequences rendered the same way. Microsoft people
have said publicly that their Hangul shaper intentionally doesn't do that.


The intention is that canonically equivalent sequences should render the same. I'm aware that MS doesn't do this in certain cases, as mentioned:

         (b) a
    handful of words where there's an <LV, T> sequence that uniscribe
    doesn't support (it has no corresponding LVT syllable), but we
    handle by decomposing to <L, V, T> and applying jamo features.

An example of this is <U+B4C0,U+11F0>, where uniscribe (using Malgun Gothic) renders the two default, unshaped glyphs for U+B4C0 (an LV syllable) and U+11F0 (a trailing jamo) separately, while harfbuzz decomposes U+B4C0 into separate leading- and vowel-jamo glyphs and then applies ljmo/vjmo/tjmo features so that the three jamos are properly composed into a single syllable block.

Thus, with harfbuzz the two sequences
  <U+B4C0,U+11F0>
  <U+1103,U+1172,U+11F0>
render the same. As I understand things, the Korean standard says the former spelling should not be used, but IMO that cannot override the fact that the Unicode standard defines them as canonically equivalent, so rendering them identically is correct.

What the patched harfbuzz still -doesn't- implement is shaping "spelled out" versions of Old Hangul sequences with multiple L, V and/or T jamos. The old MS Hangul spec gave an example where the leading jamo now encoded at U+A972 (CHOSEONG PIEUP-SIOS-THIEUTH) was encoded as the sequence <U+1107,U+1109,U+1110> and then composed (and similarly for the V and T jamos), so that a complete syllable was composed from a sequence of the form <L, L, L, V, V, V, T, T, T>.

I experimented with a patch that would support this, and the result looked OK (to my un-Korean eyes) when using the UnBatang font (not so good with Malgun Gothic). However, this is not canonically equivalent, and my understanding is that with Unicode having added all the complex jamos, there is no longer any real requirement or desire to support such sequences. So I haven't included this.

JK

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to