Motivated by this discussion, I managed to added script itemization to the HarfBuzz “port” of LibreOffice. Just in case anyone is interested: https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=1615b7f1d078b2bdf22a856066346e701f816b72
Regards, Khaled On Fri, Jan 10, 2014 at 05:18:51PM +0200, Ariel Malka wrote: > Follow-up to an earlier discussion with Khaled: > > > You basically scan the text, itemize it into contagious script runs and > > shape each one separately using HarfBuzz. If you are also doing BiDi > > itemization, then both can interfere (you might end with runs > > containing only characters with common script property after doing BiDi, > > so they will be shaped with the default script which can be wrong), so > > you need to do script itemization first, and BiDi itemization separately > > then combine both to get runs of same a script and direction to be > > shaped separately > > This has been synthesized into: > https://github.com/arielm/Unicode/tree/master/Projects/BIDI > > The relevant "action" is taking place here: > https://github.com/arielm/Unicode/blob/master/Projects/BIDI/src/TextItemizer.cpp > > HTH, > Ariel > > > On Sun, Dec 15, 2013 at 5:02 PM, Khaled Hosny <[email protected]> wrote: > > > On Sun, Dec 15, 2013 at 04:38:51PM +0200, Ariel Malka wrote: > > > I have rendered text successfully with a few different complex scripts > > > ("Hebr", "Arab", "Hang", "Hani", "Thai", etc.) and it looks like the > > > hb_buffer_set_language() is not affecting the result. > > > > > > The first question I'm asking is therefore: what is the purpose > > > of hb_buffer_set_language()? > > > Or in other words: is there a combination which require both the language > > > and script values to be defined? > > > > Many fonts have language-specific features, for example: > > https://bugs.webkit.org/show_bug.cgi?id=37984 > > > > Without setting a language, HarfBuzz will use the ‘dflt’ language (AFIK) > > and the result can be wrong in such cases. > > > > > My second question is regarding mapping: is there a way to obtain a > > > hb_script_tag from a language-code string (e.g. "he" -> > > HB_SCRIPT_HEBREW)? > > > > Many languages are written in different scripts, so there is not always > > a one to one language to script mapping. The proper way to get the > > script of a piece of text is by checking the script property of its > > characters, using the algorithm described by Unicode: > > http://www.unicode.org/reports/tr24/ > > > > You basically scan the text, itemize it into contagious script runs and > > shape each one separately using HarfBuzz. If you are also doing BiDi > > itemization, then both can interfere (you might end with runs > > containing only characters with common script property after doing BiDi, > > so they will be shaped with the default script which can be wrong), so > > you need to do script itemization first, and BiDi itemization separately > > then combine both to get runs of same a script and direction to be > > shaped separately. > > > > I find this code an easy to grasp example: > > https://github.com/mapnik/mapnik/blob/master/src/text/itemizer.cpp > > > > Regards, > > Khaled > > _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
