Re: simplifying configuration of encoded characters/entities output

Gavin Smith Wed, 29 Dec 2021 07:51:19 -0800

On Wed, Dec 29, 2021 at 01:35:05PM +0100, Patrice Dumas wrote:
> Here is my proposal for HTML
> * remove FALLBACK_TO_NUMERIC_ENTITY, always setting it for HTML (and
>   never for TexinfoXML, or always set, not sure about it, and probably
>   does not matter much).
> * remove ENABLE_ENCODING_USE_ENTITY
> * if ENABLE_ENCODING is set, try to output unicode points encoded
>   characters for every output, be it accents like @'e, @-commands like
>   @l{} or dashes and quotes.


I'm happy with this.

I couldn't find much information online about whether using the
entities or using raw UTF-8 was better.

I did find this page:
https://docs.microsoft.com/en-us/troubleshoot/browsers/wrong-character-set-for-html-page

and I do remember seeing that some old browsers gave you the choice of
which encoding to use for a page.  Hence, using entities seems like
a more reliable way of specifying a character, in case the page encoding
is set/detected incorrectly by some old browser.

If a document has a lot of non-ASCII characters (e.g. if it's written
in Chinese), then the behaviour you state with ENABLE_ENCODING would
be better.

Agreed that the choice for TexinfoXML doesn't matter.

> 
> That would mean 3 possibilities for HTML
> * default, use named entities if possible, fallback to numeric entities
> * --enable-encoding triggers outputting encoded characters
> * with USE_NUMERIC_ENTITY output numeric entities
> 
> 
> Note than in most if not all cases, the actual output would still be
> guarded by the OUTPUT_ENCODING_NAME value, such that the conversions
> with ENABLE_ENCODING set are only done when they are known to be
> possible.
> 
> Opinions, ideas?
> 
> -- 
> Pat
>

Re: simplifying configuration of encoded characters/entities output

Reply via email to