The easiest would be to add a new API analogous to hb_ot_font_set_funcs(), that does NOT have the symbol shift in it
That works.

Thanks,
Eric.


On 1/19/18 4:43 PM, Behdad Esfahbod wrote:
Ok, let's see how we can address this...

I don't like a setting on the buffer as currently the get_glyph() callback has no way of accessing that information.  The easiest would be to add a new API analogous to hb_ot_font_set_funcs(), that does NOT have the symbol shift in it.  It's not the most elegant solution but easiest.  Would that work for you?

That said, this issue is also related, as it pertains another non-Unicode encoding, though, in the font not the buffer:

https://github.com/harfbuzz/harfbuzz/issues/681

On Thu, Jan 18, 2018 at 11:27 PM, Eric Muller <emul...@amazon.com <mailto:emul...@amazon.com>> wrote:

    I want to build a rendering system where U+0041 renders as an "A",
    regardless of the selected font.

    Eric.



    On 1/17/18 3:48 PM, Behdad Esfahbod wrote:
    What's the actual problem you are facing?

    On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emul...@amazon.com
    <mailto:emul...@amazon.com>> wrote:


        It's clear that if the symbol font is asked by name, we
        should do the shift.
        I think I disagree, in the sense that HB should not impose
        that behavior on it's clients. HB is clearly the right place
        to implement the behavior, but the choice of having that
        behavior or not should be with the client.

        For any document format, rendering the moral equivalent of <p
        font-family='symbol'>&#x0041;</p> with something else that an
        "A" implies that all ASCII is PUA. That's a choice Word,
        InDesign, Notepad may make if they want, but it should not be
        imposed on all users of HB.

        Personally, I think it is a very bad choice for HTML, and
        Firefox seems to agree. It seems nice and user friendly at
        first, but this makes the document ambiguous. What about <p
        font-family='minion, symbol'>&#x0041;</p>? It's an A or not
        an A depending on the presence of "minion" in the client.
        What does the document mean?

        Of course, <p font-family='symbol'>&#xF041;</p> should render
        with the glyph symbol.cmap(F041). So even if the shift is
        never done, the glyph is usable. It's just that you don't
        have the convenience of an IME-like mechanism provided by the
        shaping engine, but you gain a reliable semantic for the text.

        That's good behavior [in Word], but beyond what HarfBuzz can do.
        Yes, which is why the shift may be acceptable or even
        desirable for some clients, and so hopefully the client could
        choose.

        What would clients do with that control then? How would they
        set it?
        If I build an app that is meant to work like other GDI apps,
        I allow the shift (and may be add mitigating measures like
        Word). If I build an app such as Firefox, I don't allow it.
        The choice is entirely driven by the type application I want
        to build, and how I want to define my document format.


        If you were to implement this choice, I can see it either in
        the construction of the HB unicode functions, or in the
        hb_buffer (either globally, or one a character by character
        basis). I have a preference for the latter: this choice could
        be passed down to the cmap lookup functions, HB or not; it
        could also be different on different parts of a document, may
        be reacting to markup.

        Eric.



        On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
        Hi Eric,

        On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller
        <emul...@amazon.com <mailto:emul...@amazon.com>> wrote:

            It seems that with a font that has only a 3, 0 cmap
            subtable (and may be some macintosh subtables), then HB
            will automatically do the shift by F000 (in the function
            get_glyph_from_symbol) for code points below U+00FF that
            are not mapped by the subtable.


        Right. Only in hb-ot-func though. Client font funcs can do
        otherwise.

            It is clear that when U+0041 A is set with a symbol
            font, then that U+0041 has actually the semantics of a
            PUA code point, and certainly should not be treated as
            an "A". That's the whole point of a 3,0 cmap subtable.


        Correct.

            Consider an HTML page. The font-family is only a request
            and there is no guarantee that the actual font will or
            will not be a symbol font. Thus the semantic of the HTML
            page can change depending on the browser environment.
            Outside a browser, it seems that the safe treatment is
            therefore to consider all code points below U+00FF as
            PUA, which is clearly not tenable. So in that
            environment, I think that the shift should not be done.
            Of course, U+F041 should work.


        My take on this is that it's a bug of the font fallback
        logic if it falls back to a symbol font.  I changed
        fontconfig to never do that.

            Note that behavior of Word 2016 on Windows is actually
            more elaborate: enter U+0041, and set it with a
            non-symbol font; copy/paste or save to a text file, and
            the result is U+0041; but set this A in a symbol font,
            and copy/paste or save to a text file, and the result is
            U+F041.


        That's good behavior, but beyond what HarfBuzz can do.

            I think that the shift should be controllable by the
            client, rather than systematically applied. I don't have
            a strong opinion about the default behavior (i.e. when
            HB's client does not specify whether the shift should be
            done or not).


        What would clients do with that control then? How would they
        set it?

        I implemented this shift in fontconfig and then harfbuzz
        because in LibreOffice and other software, there were
        existing documents that referred to windings or other symbol
        fonts and encoding characters in the ASCII range. It's clear
        that if the symbol font is asked by name, we should do the
        shift. If it's NOT, then it should not be chosen to render
        text to begin with, which means the shift can be applied
        unconditionally.

        How does that sound?
        behdad

            Thoughts?

            Thanks,
            Eric.

-- behdad
        http://behdad.org/




-- behdad
    http://behdad.org/




--
behdad
http://behdad.org/

_______________________________________________
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to