What's the actual problem you are facing?
On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emul...@amazon.com
<mailto:emul...@amazon.com>> wrote:
It's clear that if the symbol font is asked by name, we
should do the shift.
I think I disagree, in the sense that HB should not impose
that behavior on it's clients. HB is clearly the right place
to implement the behavior, but the choice of having that
behavior or not should be with the client.
For any document format, rendering the moral equivalent of <p
font-family='symbol'>A</p> with something else that an
"A" implies that all ASCII is PUA. That's a choice Word,
InDesign, Notepad may make if they want, but it should not be
imposed on all users of HB.
Personally, I think it is a very bad choice for HTML, and
Firefox seems to agree. It seems nice and user friendly at
first, but this makes the document ambiguous. What about <p
font-family='minion, symbol'>A</p>? It's an A or not
an A depending on the presence of "minion" in the client.
What does the document mean?
Of course, <p font-family='symbol'></p> should render
with the glyph symbol.cmap(F041). So even if the shift is
never done, the glyph is usable. It's just that you don't
have the convenience of an IME-like mechanism provided by the
shaping engine, but you gain a reliable semantic for the text.
That's good behavior [in Word], but beyond what HarfBuzz can do.
Yes, which is why the shift may be acceptable or even
desirable for some clients, and so hopefully the client could
choose.
What would clients do with that control then? How would they
set it?
If I build an app that is meant to work like other GDI apps,
I allow the shift (and may be add mitigating measures like
Word). If I build an app such as Firefox, I don't allow it.
The choice is entirely driven by the type application I want
to build, and how I want to define my document format.
If you were to implement this choice, I can see it either in
the construction of the HB unicode functions, or in the
hb_buffer (either globally, or one a character by character
basis). I have a preference for the latter: this choice could
be passed down to the cmap lookup functions, HB or not; it
could also be different on different parts of a document, may
be reacting to markup.
Eric.
On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
Hi Eric,
On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller
<emul...@amazon.com <mailto:emul...@amazon.com>> wrote:
It seems that with a font that has only a 3, 0 cmap
subtable (and may be some macintosh subtables), then HB
will automatically do the shift by F000 (in the function
get_glyph_from_symbol) for code points below U+00FF that
are not mapped by the subtable.
Right. Only in hb-ot-func though. Client font funcs can do
otherwise.
It is clear that when U+0041 A is set with a symbol
font, then that U+0041 has actually the semantics of a
PUA code point, and certainly should not be treated as
an "A". That's the whole point of a 3,0 cmap subtable.
Correct.
Consider an HTML page. The font-family is only a request
and there is no guarantee that the actual font will or
will not be a symbol font. Thus the semantic of the HTML
page can change depending on the browser environment.
Outside a browser, it seems that the safe treatment is
therefore to consider all code points below U+00FF as
PUA, which is clearly not tenable. So in that
environment, I think that the shift should not be done.
Of course, U+F041 should work.
My take on this is that it's a bug of the font fallback
logic if it falls back to a symbol font. I changed
fontconfig to never do that.
Note that behavior of Word 2016 on Windows is actually
more elaborate: enter U+0041, and set it with a
non-symbol font; copy/paste or save to a text file, and
the result is U+0041; but set this A in a symbol font,
and copy/paste or save to a text file, and the result is
U+F041.
That's good behavior, but beyond what HarfBuzz can do.
I think that the shift should be controllable by the
client, rather than systematically applied. I don't have
a strong opinion about the default behavior (i.e. when
HB's client does not specify whether the shift should be
done or not).
What would clients do with that control then? How would they
set it?
I implemented this shift in fontconfig and then harfbuzz
because in LibreOffice and other software, there were
existing documents that referred to windings or other symbol
fonts and encoding characters in the ASCII range. It's clear
that if the symbol font is asked by name, we should do the
shift. If it's NOT, then it should not be chosen to render
text to begin with, which means the shift can be applied
unconditionally.
How does that sound?
behdad
Thoughts?
Thanks,
Eric.
--
behdad
http://behdad.org/
--
behdad
http://behdad.org/