On 08/01/11 12:20, Butrus Damaskus wrote: > Hi Behdad, > > don't You plan to add some property to switch normaliation off?
No. Why? behdad > Thanks > > On Sun, Jul 24, 2011 at 5:37 AM, Behdad Esfahbod <[email protected]> wrote: >> Hi, >> >> If you paid any attention you may have noticed that I was hacking on getting >> composition / decomposition working. And I'm glad to announce that it's all >> done and working and in master already. >> >> Here's the relevant source file: >> >> http://cgit.freedesktop.org/harfbuzz/tree/src/hb-ot-shape-normalize.cc >> >> These rely on the two new Unicode callbacks compose() and decompose(). I've >> added similar APIs to glib (which will be shipped in 2.30), and the hb-glib >> glue layer uses those if available, or falls back to using >> g_utf8_normalize(), >> which is much much slower. The hb-icu layer implements these using >> unorm_normalize() which has the same slowness problem. If someone wants to >> look into adding compose()/decompose() API to ICU, that would be really cool. >> >> Here's a few lines about the design, copying from comments in the source: >> >> /* >> * HIGHLEVEL DESIGN: >> * >> * This file exports one main function: _hb_ot_shape_normalize(). >> * >> * This function closely reflects the Unicode Normalization Algorithm, >> * yet it's different. The shaper an either prefer decomposed (NFD) or >> * composed (NFC). >> * >> * In general what happens is that: each grapheme is decomposed in a chain >> * of 1:2 decompositions, marks reordered, and then recomposed if desires, >> * so far it's like Unicode Normalization. However, the decomposition and >> * recomposition only happens if the font supports the resulting characters. >> * >> * The goals are: >> * >> * - Try to render all canonically equivalent strings similarly. To really >> * achieve this we have to always do the full decomposition and then >> * selectively recompose from there. It's kinda too expensive though, so >> * we skip some cases. For example, if composed is desired, we simply >> * don't touch 1-character clusters that are supported by the font, even >> * though their NFC may be different. >> * >> * - When a font has a precomposed character for a sequence but the 'ccmp' >> * feature in the font is not adequate, form use the precomposed >> character >> * which typically has better mark positioning. >> * >> * - When a font does not support a character but supports its >> * decomposition, well, use the decomposition. >> * >> * - The Indic shaper requests decomposed output. This will handle >> * splitting matra for the Indic shaper. >> */ >> >> /* We do a farily straightforward yet custom normalization process in three >> * separate rounds: decompose, reorder, recompose (if desired). Currently >> * this makes two buffer swaps. We can make it faster by moving the last >> * two rounds into the inner loop for the first round, but it's more >> * readable this way. */ >> >> >> Comments, feedback, and testing re functionality and performance is >> appreciated! >> >> Cheers, >> behdad >> _______________________________________________ >> HarfBuzz mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/harfbuzz >> > _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
