Follow-up Comment #14, bug #67735 (group groff): At 2026-03-19T13:50:36-0400, Dave wrote: > Follow-up Comment #13, bug #67735 (group groff): > > [comment #0 original submission:] >> It's necessary to nail this down to migrate the underlying >> representation type to something wide enough to hold Unicode >> code points. > > Said migration is now bug #68129, which per the above I've made > dependent on this ticket.
I guess I could illuminate my plans here.
Step 1: Migrate handling of input characters to `unsigned char`.
Step 2: Create new structure type `grochar`, consisting solely of an
`unsigned char` initially. This is to make the type opaque to
C++'s C-based legacy type system, which aggressively
interconverts integral types. `typedef` is one of the most
misleading programming language keywords ever devised.
Step 3: Convert `grochar` into a more elaborate `class` or `struct` that
uses a wider type (likely `char32_t`) internally and includes a
constructor handling _signed_ characters read from input.
Because UTF-8 is a variable-length encoding, I don't see how we
can handle it without tightly coupling this type with GNU
troff's input stream reader code. Either this class or string
and file stream readers need to be prepared to "pump" the input
stream to collect enough bytes to decide the validity of a
(variable-length) input character.
Step 4: ???
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?67735>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
