On Mon, 30 Sep 2013 19:36:19 +0100 Jonathan Kew <[email protected]> wrote:
> On 30/9/13 19:08, Behdad Esfahbod wrote: > > On 13-09-30 09:05 AM, Toresson, Alexander (EXT) wrote: > >> Hello all, > >> > >> > >> For for example Bengali, a dotted circle (U+25CC) is inserted > >> before standalone combining marks. The same is not done for Thai, > >> except for the first character in a paragraph/text (--bot for > >> hb-shape/hb-view). Why? According to > >> http://www.microsoft.com/typography/otfntdev/thaiot/other.htm, > >> “invalid combinations” should cause a dotted circle to be inserted. > > > > That's something we want to fix, but we have not got to yet. > > > > ....although it raises the difficult and potentially controversial > question of what exactly is an "invalid combination". And for the Thai script, there are a few general purpose diacritics which are used by dictionary and minority writing systems, such as U+0331 COMBINING MACRON BELOW and U+0359 COMBINING ASTERISK BELOW. The Microsoft restrictions are pretty liberal, especially compared to the Lao ones, which prohibit Pali. However, the prohibition on two tone marks is iffy, as in Tai Lue in the Lao script combinations of tone marks do occur. (Uniscribe, as described, also prohibits Tai Lue in the Lao script.) What makes eminent sense, but is probably unduly hard, is to treat allegedly 'invalid combinations' as invalid if mark-to-mark positioning is not employed. Superimposed marks are unreadable and probably wrong. Indeed, some of the abominable Latin script combinations lurking in the Common Locale Data Repository (CLDR) would benefit from the automatic insertion of dotted circles. > >> Speaking of invalid combinations, it seems like HarfBuzz allows > >> for example U+0E48 to be combined with for example latin U+0041, > >> which seems rather permissive. Thai and Lao combining marks are frequently displayed on hyphen- or 'x'-shaped characters. The preferred choices seem to be U+2013 EN DASH for Thai and U+00D7 MULTIPLICATION SIGN for Lao, though I have encountered the counter-claim that the latter is actually a sanserif 'x' when used as a base character. Richard. _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
