[bug #40720] [UPGRADE] improve Unicode support

G. Branden Robinson Tue, 25 Nov 2025 22:50:01 -0800

Follow-up Comment #10, bug #40720 (group groff):

A more practical point implicating the UTF-8 migration is that right now we
accept as identifiers code point sequences that are invalid as UTF-8 are
presently accepted.


Here's an example from our friend bug #67734 again.


printf '.nr \311l 57\n.tm \\n(\311l\n'


(That's "Él".)

This is invalid UTF-8 because \311 starts a multibyte sequence; its high bit
is set and therefore this byte must be followed by at least one more byte with
its high bit set.

Bug #67734 reasons that we might as well start rejecting those things now,
because when we eventually land direct reading of UTF-8 in GNU _troff_, we'll
be doing so at that time anyway.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?40720>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #40720] [UPGRADE] improve Unicode support

Reply via email to