It seems that with a font that has only a 3, 0 cmap subtable (and may be some macintosh subtables), then HB will automatically do the shift by F000 (in the function get_glyph_from_symbol) for code points below U+00FF that are not mapped by the subtable.

It is clear that when U+0041 A is set with a symbol font, then that U+0041 has actually the semantics of a PUA code point, and certainly should not be treated as an "A". That's the whole point of a 3,0 cmap subtable.

Consider an HTML page. The font-family is only a request and there is no guarantee that the actual font will or will not be a symbol font. Thus the semantic of the HTML page can change depending on the browser environment. Outside a browser, it seems that the safe treatment is therefore to consider all code points below U+00FF as PUA, which is clearly not tenable. So in that environment, I think that the shift should not be done. Of course, U+F041 should work.

Note that behavior of Word 2016 on Windows is actually more elaborate: enter U+0041, and set it with a non-symbol font; copy/paste or save to a text file, and the result is U+0041; but set this A in a symbol font, and copy/paste or save to a text file, and the result is U+F041.

I think that the shift should be controllable by the client, rather than systematically applied. I don't have a strong opinion about the default behavior (i.e. when HB's client does not specify whether the shift should be done or not).

Thoughts?

Thanks,
Eric.

_______________________________________________
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to