On Wed, Dec 29, 2021 at 01:35:05PM +0100, Patrice Dumas wrote:
> Here is my proposal for HTML
> * remove FALLBACK_TO_NUMERIC_ENTITY, always setting it for HTML (and
> never for TexinfoXML, or always set, not sure about it, and probably
> does not matter much).
> * remove ENABLE_ENCODING_USE_ENTITY
> * if ENABLE_ENCODING is set, try to output unicode points encoded
> characters for every output, be it accents like @'e, @-commands like
> @l{} or dashes and quotes.
I'm happy with this.
I couldn't find much information online about whether using the
entities or using raw UTF-8 was better.
I did find this page:
https://docs.microsoft.com/en-us/troubleshoot/browsers/wrong-character-set-for-html-page
and I do remember seeing that some old browsers gave you the choice of
which encoding to use for a page. Hence, using entities seems like
a more reliable way of specifying a character, in case the page encoding
is set/detected incorrectly by some old browser.
If a document has a lot of non-ASCII characters (e.g. if it's written
in Chinese), then the behaviour you state with ENABLE_ENCODING would
be better.
Agreed that the choice for TexinfoXML doesn't matter.
>
> That would mean 3 possibilities for HTML
> * default, use named entities if possible, fallback to numeric entities
> * --enable-encoding triggers outputting encoded characters
> * with USE_NUMERIC_ENTITY output numeric entities
>
>
> Note than in most if not all cases, the actual output would still be
> guarded by the OUTPUT_ENCODING_NAME value, such that the conversions
> with ENABLE_ENCODING set are only done when they are known to be
> possible.
>
> Opinions, ideas?
>
> --
> Pat
>