Hi John, At 2026-03-02T05:28:45+1100, John Gardner wrote: > I was overjoyed to hear about this release. Until I got to this part > of the release notes:
I had a similar experience reading your email. :(
> GNU troff no longer accepts C0 controls characters in identifiers. We
> prohibit C0 controls to make the language less tolerant of unreadable
> input
>
> There's no reason to have done this.
Yes, there is. See the 2nd paragraph in the "Other Differences"
section of groff_diff(7), or the corresponding section of groff's
Texinfo manual.
Other differences
...
Use of C0 control characters in identifiers is not portable;
Solaris, Plan 9, and Heirloom Doctools troffs accept Control+B,
Control+C, Control+E, Control+F, and Control+G (only); DWB 3.3
troff does not. GNU troff rejects C0 controls in identifiers with
an error diagnostic.
> You've foisted a needlessly disruptive change upon users that breaks
> compatibility with older Groff versions and other Troff
> implementations.
I'd say it isn't needlessly disruptive, because anyone working with DWB
3.3 troff is going to get disrupted anyway.
> CSTR-54's section on valid character input is predictably terse, but
> it clearly states in §10.1 <https://troff.org/54.pdf#page=21>*Input
> character translations <https://troff.org/54.pdf#page=21>* that
>
> > Ways of inputting the valid character set were discussed in §2.1.
> > The ASCII control characters horizontal tab (§9.1), SOH (§9.1), and
> > backspace (§10.3) are discussed elsewhere. The newline delimits
> > input lines. *In addition, STX, ETX, ENQ, ACK, and BEL
(also known as Control+B, Control+C, Control+E, Control+F, and
Control+G)
> > are accepted*,
...as input characters, yes, and GNU troff has made no change here.
$ nroff --version | head -n 1
GNU nroff (groff) version 1.24.0
$ printf '.ds S john\002\003\005\006\007gardner\n.nf\n\\*S\n.pl \\n(nlu\n' \
| nroff | od -c
0000000 j o h n 002 003 005 006 \a g a r d n e r
0000020 \n
0000021
> > and may be used as delimiters
Yup. That's unchanged too.
$ printf '.nf\n.ie \006john gardner\006john gardner\006 true\n.el false\n.pl
\\n(nlu\n' \
| nroff
true
> > or translated into a graphic with tr (§10.5).
And this is unchanged as well.
$ printf '.nf\n.tr \002!\003#\005$\006%%\007&\nhello\002\003\005\006\007john
gardner\n.pl \\n(nlu\n' \
| nroff
hello!#$%&john gardner
GNU troff 1.24.0 is fulfilling the expectations one can validly infer
from the text of CSTR #54. That document does not claim that these
control characters are valid in _all_ contexts. It doesn't mention
identifiers there; it also doesn't mention register format expressions
or tab stop types.
Should GNU troff also be accepting of '.af nn <BEL><STX>'? What would
that _mean_? What about the selection of tab stop types with the `ta`
request?[1]
> > All others are ignored.
CSTR #54 does not explicitly countenance the use of these characters in
identifiers, and DWB 3.3[2] exercised this wiggle room as GNU troff now
has.
> Consider the amount of work that's been put into making Groff
> backwards-compatible with historical Troffs (even in this new
> release).
I believe I have a notion thereof.
> How does it make sense to support *most* historical Troff macros and
> syntax, yet make a selective exception for a feature that,
> realistically, few Groff users know about, and even fewer make use of?
Because it's not generally portable anyway. See above.
Here's the commit message:
commit b6a737385406f9fd3df4ece0a4814b9fd1a500d9
Author: G. Branden Robinson <[email protected]>
Date: Thu Dec 25 23:04:21 2025 -0600
[troff]: Fix Savannah #67734.
* src/roff/troff/input.cpp: Add new Boolean-valued parameter to
`read_input_until_terminator()`, `want_identifier`, defaulting to
false, so that we can distinguish callers that want a GNU troff
identifier from those gathering some other kind of input. This is so
that can we can reject (all) C0 control and Latin-1 Supplement
characters in identifiers. (C1 controls are already rejected on
input.) The prohibition against C0 controls is to make the language
less tolerant of unreadable input, and the latter is to enable us to
pivot to reading UTF-8-encoded input in a future release.
(read_input_until_terminator): Update declaration to add new parameter
with default value. Update definition to reject, with error
diagnostic, character codes less than 32 and greater than
159. Add assertion that the putative identifier character is not a
space (character code 32); these have never been valid in *roff
identifiers. This function's callers must ensure that the terminator
precedes any space in the input.
Fixes <https://savannah.gnu.org/bugs/?67734>.
NEWS: Report change.
Finally, I'll note that not only was that ticket filed in Savannah,
and thus reflected to bug-groff where intrepid users can keep an eye on
my wild ideas and talk me down from unwise ones, but for those of less
stout heart there is this list and the _much_ lower-traffic info-groff
list.
https://lists.gnu.org/archive/html/info-groff/
This change was advertised four times in the last two months, on that
list and this one, in an email closely resembling the one to which
you've replied. Why did you not raise your concern sooner?
Regards,
Branden
[1] CSTR #54 §9.2 "Set tab stop and types. t=R, right adjusting; t=C,
centering; t absent, left adjusting." It doesn't say that other
types are _prohibited_. If one exercises that nonimplication, then
a problem of ambiguity immediately arises; what if the character(s)
chosen for a "tab stop type" are themselves valid in a numeric
expression?
[2] I assume, but don't know (because I have no system with a DWB 2.0
troff that I can run) that DWB 2.0 accepted these C0 characters in
identifiers because Solaris troff, which is a System V troff, which
I _think_ is descended from DWB 2.0 troff, does so. But it's
hazardous to place high confidence in that conclusion given that the
behavior of no other specimens of a System V troff or any version of
DWB troff other than 3.3, is available or has been reported to me.
signature.asc
Description: PGP signature
