Re: [HarfBuzz] 'vert' substitutions in CJK fonts
On 13-02-04 12:34 PM, Grigori Goronzy wrote: The GSUB table of problematic fonts typically looks a bit too... minimal. Here's an example, that's the MOTOYA LMaru font, Android's standard CJK font: ... So subtitutions are only used if the run that is shaped has Katakana (kana) script and language is set to Japanese (JAN). It works if I explicitly set the language and force the script to Katakana. But, in practice, that's of course not true! First, it breaks as soon as the system language is not Japanese, unless the language has been overridden. Second, not only Katakana characters have vertical variants. Punctuation might or might not be substituted depending on context, because punctuation characters have common script and assume the script of characters around them. If they're next to Kanji characters, it will break. Grigori, welcome to the darker sides of text rendering :). This is what Pango does, and what eventually I want to make easier doing with HarfBuzz: - Say, system language is en. Upon detecting Katakana, Pango then proceeds to resolve the language to assign to that run of text. Pango knows what scripts each language tag (locale) uses. As such it correctly detects that English doesn't use Katakana, and as such this run can't be in English. It then goes searching for a better language tag for the run: * If env vars $LANGUAGE and/or $PANGO_LANGUAGE are set, it looks in the languages listed there (those are each a list of language tags), and picks the first one that uses Katakana, * If that fails, it knows that most likely language tag for Katakana is ja, so it uses that. This is really useful. For example, by default when Pango sees text in Arabic script, it behaves as if it's in Arabic language. But if I set LANGUAGE=en,fa in my system, then Pango will attribute untagged Arabic script text to Persian instead of Arabic language. This is all in pango-language.c. Check it out. The other problem you point it is also handled by resolving Script=Common/Inherited characters to their neighboring scripts. So in this case, even the punctuation will be marked 'kana'. That said, it is a known shortcoming of OpenType, that a lone punctuation character cannot hit any script tables other than DFLT... Should fonts with GSUB tables like that considered broken? Yes, it should define a default language system. But then, many fonts in common use are broken one way or another... What does Uniscribe do to make this work? Don't know. And lastly, can I force HarfBuzz to just use the first 'vert' substitution lookup in case there's none to be found with matching or DFLT script/language system? Not really / easily. You can use the hb-ot.h API to detect that, and find the OT LangSys tag that *does* have the substitution, then use hb_ot_tag_to_language() to get a language tag that when passed back to HarfBuzz, will choose that substitution. Cheers, -- behdad http://behdad.org/ ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] 'vert' substitutions in CJK fonts
Hi, Thank you for opening the interesting discussion. I think, the script names in OpenType spec are not identical with the block names in Unicode; kana does not specify the small group of katakana and hiragana, but also specify the group including katakana, hiragana, CJK ideographs, CJK punctuations, CJK symbols etc etc. When I worked for poppler (PDF rendering library), I got similar problem; http://lists.freedesktop.org/archives/poppler/2012-March/008860.html I should note that the default language system strategy would not work well with (old versions of?) Batang font (a Korean font bundled to Microsoft Windows). when vertical text is requested without embedded font, how OpenType layout feature should be configured; I used the combinations CHN/hani for Chinese Simplified or Traditional, JAN/kana for Japanese, KOR/hang for Korean. But it was designed to fit the internal design of the poppler, more comprehensive consideration would be expected for real i18n software. Regards, mpsuzuki Grigori Goronzy wrote: Hi, a user of my library reported that vertical alternates are not correctly subtituted in many CJK fonts. I am a bit puzzled by this. When doing vertical layout of Japanese text, the 'vert' feature is enabled in the library to select vertical variants of some Kanji, Kana and punctuation characters. This works fine with many fonts, but with some it does not. The GSUB table of problematic fonts typically looks a bit too... minimal. Here's an example, that's the MOTOYA LMaru font, Android's standard CJK font: Table 0 of 16: GSUB (0x010c+0x016c) 1 script(s) found in table Script 0 of 1: kana No default language system 1 language system(s) found in script Language System 0 of 1: JAN No required feature 1 feature(s) found in language system Feature index 0 of 1: 0 1 feature(s) found in table Feature 0 of 1: vert; 1 lookup(s) 1 lookup(s) found in feature Lookup index 0 of 1: 0 1 lookup(s) found in table Lookup 0 of 1: type 1, props 0x0001 So subtitutions are only used if the run that is shaped has Katakana (kana) script and language is set to Japanese (JAN). It works if I explicitly set the language and force the script to Katakana. But, in practice, that's of course not true! First, it breaks as soon as the system language is not Japanese, unless the language has been overridden. Second, not only Katakana characters have vertical variants. Punctuation might or might not be substituted depending on context, because punctuation characters have common script and assume the script of characters around them. If they're next to Kanji characters, it will break. Should fonts with GSUB tables like that considered broken? What does Uniscribe do to make this work? And lastly, can I force HarfBuzz to just use the first 'vert' substitution lookup in case there's none to be found with matching or DFLT script/language system? Best regards Grigori ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] 'vert' substitutions in CJK fonts
On 02/05/2013 01:50 AM, suzuki toshiya wrote: Hi, Thank you for opening the interesting discussion. I think, the script names in OpenType spec are not identical with the block names in Unicode; kana does not specify the small group of katakana and hiragana, but also specify the group including katakana, hiragana, CJK ideographs, CJK punctuations, CJK symbols etc etc. They are not identical to Unicode, but kana indeed means just Katakana and Hiragana in OpenType, at least according to the specification: https://www.microsoft.com/typography/otspec/scripttags.htm The lack of detail in the OpenType specification is really bad... in this case it just says it's not always similar to Unicode but doesn't explain how it differs from it either. :( When I worked for poppler (PDF rendering library), I got similar problem; http://lists.freedesktop.org/archives/poppler/2012-March/008860.html I should note that the default language system strategy would not work well with (old versions of?) Batang font (a Korean font bundled to Microsoft Windows). Hmm, interesting... but lack of language-specific matching is not the problem here. when vertical text is requested without embedded font, how OpenType layout feature should be configured; I used the combinations CHN/hani for Chinese Simplified or Traditional, JAN/kana for Japanese, KOR/hang for Korean. But it was designed to fit the internal design of the poppler, more comprehensive consideration would be expected for real i18n software. So you use a fixed script for a given language? I don't know, but this seems to be quite hacky. Often you don't even know what language you're going to display. This might work in poppler's case, but in my case (render some line of Unicode text with arbitrary languages) it does not. Best regards Grigori ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] 'vert' substitutions in CJK fonts
Hi, Grigori Goronzy wrote: On 02/05/2013 01:50 AM, suzuki toshiya wrote: Hi, Thank you for opening the interesting discussion. I think, the script names in OpenType spec are not identical with the block names in Unicode; kana does not specify the small group of katakana and hiragana, but also specify the group including katakana, hiragana, CJK ideographs, CJK punctuations, CJK symbols etc etc. They are not identical to Unicode, but kana indeed means just Katakana and Hiragana in OpenType, at least according to the specification: https://www.microsoft.com/typography/otspec/scripttags.htm Oh, I had overlooked that kana appears twice for Hiragana and Katakana. The lack of detail in the OpenType specification is really bad... in this case it just says it's not always similar to Unicode but doesn't explain how it differs from it either. :( Indeed. I should ask SC29/WG11 font AHG people for the possibility of further clarification. When I worked for poppler (PDF rendering library), I got similar problem; http://lists.freedesktop.org/archives/poppler/2012-March/008860.html I should note that the default language system strategy would not work well with (old versions of?) Batang font (a Korean font bundled to Microsoft Windows). Hmm, interesting... but lack of language-specific matching is not the problem here. I'm sorry - yes, the font referrers for the non-embedded CID-keyed font in PDF often provide the additional information about the script, so, such method cannot be applied to the rendering of the plain Unicode text. when vertical text is requested without embedded font, how OpenType layout feature should be configured; I used the combinations CHN/hani for Chinese Simplified or Traditional, JAN/kana for Japanese, KOR/hang for Korean. But it was designed to fit the internal design of the poppler, more comprehensive consideration would be expected for real i18n software. So you use a fixed script for a given language? I don't know, but this seems to be quite hacky. Often you don't even know what language you're going to display. This might work in poppler's case, but in my case (render some line of Unicode text with arbitrary languages) it does not. Indeed, it's not easy to guess the language from the plain Unicode text. I understand as the problem in your first post was that the OpenType script tag has an ambiguity, or, is less-practical (if the spec is understood precisely) to cover the Unicode characters to be controlled by the feature. It's correct? Regards, mpsuzuki ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz