Follow-up to an earlier discussion with Khaled: > You basically scan the text, itemize it into contagious script runs and > shape each one separately using HarfBuzz. If you are also doing BiDi > itemization, then both can interfere (you might end with runs > containing only characters with common script property after doing BiDi, > so they will be shaped with the default script which can be wrong), so > you need to do script itemization first, and BiDi itemization separately > then combine both to get runs of same a script and direction to be > shaped separately
This has been synthesized into: https://github.com/arielm/Unicode/tree/master/Projects/BIDI The relevant "action" is taking place here: https://github.com/arielm/Unicode/blob/master/Projects/BIDI/src/TextItemizer.cpp HTH, Ariel On Sun, Dec 15, 2013 at 5:02 PM, Khaled Hosny <[email protected]> wrote: > On Sun, Dec 15, 2013 at 04:38:51PM +0200, Ariel Malka wrote: > > I have rendered text successfully with a few different complex scripts > > ("Hebr", "Arab", "Hang", "Hani", "Thai", etc.) and it looks like the > > hb_buffer_set_language() is not affecting the result. > > > > The first question I'm asking is therefore: what is the purpose > > of hb_buffer_set_language()? > > Or in other words: is there a combination which require both the language > > and script values to be defined? > > Many fonts have language-specific features, for example: > https://bugs.webkit.org/show_bug.cgi?id=37984 > > Without setting a language, HarfBuzz will use the ‘dflt’ language (AFIK) > and the result can be wrong in such cases. > > > My second question is regarding mapping: is there a way to obtain a > > hb_script_tag from a language-code string (e.g. "he" -> > HB_SCRIPT_HEBREW)? > > Many languages are written in different scripts, so there is not always > a one to one language to script mapping. The proper way to get the > script of a piece of text is by checking the script property of its > characters, using the algorithm described by Unicode: > http://www.unicode.org/reports/tr24/ > > You basically scan the text, itemize it into contagious script runs and > shape each one separately using HarfBuzz. If you are also doing BiDi > itemization, then both can interfere (you might end with runs > containing only characters with common script property after doing BiDi, > so they will be shaped with the default script which can be wrong), so > you need to do script itemization first, and BiDi itemization separately > then combine both to get runs of same a script and direction to be > shaped separately. > > I find this code an easy to grasp example: > https://github.com/mapnik/mapnik/blob/master/src/text/itemizer.cpp > > Regards, > Khaled >
_______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
