Hi Branden,
> Because it's not generally portable anyway. I'm aware <https://github.com/Alhadis/Roff-Character-Tests> C0 control codes in Roff identifiers aren't handled consistently (or at all) between implementations. That's still no reason to abruptly drop support for something that wasn't harming anybody in the first place. Also, please do remember that I'm referring expressly to 7-bit ASCII in this discussion. Dropping support for C1 control codes and any other byte in the high-bit range is easily justified in the face of eventual UTF-8 support, given the conflict with continuation bytes and everything. No such justification exists for dropping support for those rarely-ACKnowledged (pun intended) characters that linger unused in the first stick of the ASCII table. This: The prohibition against C0 controls is to make the language less tolerant > of unreadable input is a flippant excuse, because if an author goes out of their way to use *control characters* in identifiers, they likely have their own reasons for doing so. Groff has hitherto encouraged authors to take advantage of its more readable extensions in favour of older, hairier syntax, but it hasn't ever tried taking said syntax away from authors, should they insist or prefer the old-school way of writing Roff. To my understanding, this is the first time Groff *has* removed part of the language for no practical or compelling reason, only assertive best-practice. That document does not claim that these control characters are valid in > *all* contexts. It doesn't mention identifiers there; I feel you've misinterpreted my reasons for citing CSTR-54. My point was that these characters are described by historical documentation, and aren't simply an undocumented implementation quirk that nobody bothered to elaborate upon or standardise. In other words, there's *precedence* to continue supporting them, irrespective of how or if other Troffs choose to do so as well. This change was advertised four times in the last two months, on that list > and this one, in an email closely resembling the one to which you've > replied. Why did you not raise your concern sooner? Because over the last couple of years, I've grown increasingly shit at keeping up-to-date with the latest developments to software projects I used to follow more proactively. This isn't even the first time in the past fortnight where I've learned far too late of a change I wish I had've known about at the time it was proposed so I *could* have voiced my objections. That's just a failing on my part; it's normally through big, important announcements like this one that I learn of things that could've benefited from my input. I'm sorry. On Mon, 2 Mar 2026 at 07:03, G. Branden Robinson < [email protected]> wrote: > Hi John, > > At 2026-03-02T05:28:45+1100, John Gardner wrote: > > I was overjoyed to hear about this release. Until I got to this part > > of the release notes: > > I had a similar experience reading your email. :( > > > GNU troff no longer accepts C0 controls characters in identifiers. We > > prohibit C0 controls to make the language less tolerant of unreadable > > input > > > > There's no reason to have done this. > > Yes, there is. See the 2nd paragraph in the "Other Differences" > section of groff_diff(7), or the corresponding section of groff's > Texinfo manual. > > Other differences > ... > Use of C0 control characters in identifiers is not portable; > Solaris, Plan 9, and Heirloom Doctools troffs accept Control+B, > Control+C, Control+E, Control+F, and Control+G (only); DWB 3.3 > troff does not. GNU troff rejects C0 controls in identifiers with > an error diagnostic. > > > You've foisted a needlessly disruptive change upon users that breaks > > compatibility with older Groff versions and other Troff > > implementations. > > I'd say it isn't needlessly disruptive, because anyone working with DWB > 3.3 troff is going to get disrupted anyway. > > > CSTR-54's section on valid character input is predictably terse, but > > it clearly states in §10.1 <https://troff.org/54.pdf#page=21>*Input > > character translations <https://troff.org/54.pdf#page=21>* that > > > > > Ways of inputting the valid character set were discussed in §2.1. > > > The ASCII control characters horizontal tab (§9.1), SOH (§9.1), and > > > backspace (§10.3) are discussed elsewhere. The newline delimits > > > input lines. *In addition, STX, ETX, ENQ, ACK, and BEL > > (also known as Control+B, Control+C, Control+E, Control+F, and > Control+G) > > > > are accepted*, > > ...as input characters, yes, and GNU troff has made no change here. > > $ nroff --version | head -n 1 > GNU nroff (groff) version 1.24.0 > $ printf '.ds S john\002\003\005\006\007gardner\n.nf\n\\*S\n.pl > \\n(nlu\n' \ > | nroff | od -c > 0000000 j o h n 002 003 005 006 \a g a r d n e r > 0000020 \n > 0000021 > > > > and may be used as delimiters > > Yup. That's unchanged too. > > $ printf '.nf\n.ie \006john gardner\006john gardner\006 true\n.el false\ > n.pl \\n(nlu\n' \ > | nroff > true > > > > or translated into a graphic with tr (§10.5). > > And this is unchanged as well. > > $ printf '.nf\n.tr > \002!\003#\005$\006%%\007&\nhello\002\003\005\006\007john gardner\n.pl > \\n(nlu\n' \ > | nroff > hello!#$%&john gardner > > GNU troff 1.24.0 is fulfilling the expectations one can validly infer > from the text of CSTR #54. That document does not claim that these > control characters are valid in _all_ contexts. It doesn't mention > identifiers there; it also doesn't mention register format expressions > or tab stop types. > > Should GNU troff also be accepting of '.af nn <BEL><STX>'? What would > that _mean_? What about the selection of tab stop types with the `ta` > request?[1] > > > > All others are ignored. > > CSTR #54 does not explicitly countenance the use of these characters in > identifiers, and DWB 3.3[2] exercised this wiggle room as GNU troff now > has. > > > Consider the amount of work that's been put into making Groff > > backwards-compatible with historical Troffs (even in this new > > release). > > I believe I have a notion thereof. > > > How does it make sense to support *most* historical Troff macros and > > syntax, yet make a selective exception for a feature that, > > realistically, few Groff users know about, and even fewer make use of? > > Because it's not generally portable anyway. See above. > > Here's the commit message: > > commit b6a737385406f9fd3df4ece0a4814b9fd1a500d9 > Author: G. Branden Robinson <[email protected]> > Date: Thu Dec 25 23:04:21 2025 -0600 > > [troff]: Fix Savannah #67734. > > * src/roff/troff/input.cpp: Add new Boolean-valued parameter to > `read_input_until_terminator()`, `want_identifier`, defaulting to > false, so that we can distinguish callers that want a GNU troff > identifier from those gathering some other kind of input. This is so > that can we can reject (all) C0 control and Latin-1 Supplement > characters in identifiers. (C1 controls are already rejected on > input.) The prohibition against C0 controls is to make the language > less tolerant of unreadable input, and the latter is to enable us to > pivot to reading UTF-8-encoded input in a future release. > > (read_input_until_terminator): Update declaration to add new > parameter > with default value. Update definition to reject, with error > diagnostic, character codes less than 32 and greater than > 159. Add assertion that the putative identifier character is not a > space (character code 32); these have never been valid in *roff > identifiers. This function's callers must ensure that the terminator > precedes any space in the input. > > Fixes <https://savannah.gnu.org/bugs/?67734>. > > NEWS: Report change. > > Finally, I'll note that not only was that ticket filed in Savannah, > and thus reflected to bug-groff where intrepid users can keep an eye on > my wild ideas and talk me down from unwise ones, but for those of less > stout heart there is this list and the _much_ lower-traffic info-groff > list. > > https://lists.gnu.org/archive/html/info-groff/ > > This change was advertised four times in the last two months, on that > list and this one, in an email closely resembling the one to which > you've replied. Why did you not raise your concern sooner? > > Regards, > Branden > > [1] CSTR #54 §9.2 "Set tab stop and types. t=R, right adjusting; t=C, > centering; t absent, left adjusting." It doesn't say that other > types are _prohibited_. If one exercises that nonimplication, then > a problem of ambiguity immediately arises; what if the character(s) > chosen for a "tab stop type" are themselves valid in a numeric > expression? > > [2] I assume, but don't know (because I have no system with a DWB 2.0 > troff that I can run) that DWB 2.0 accepted these C0 characters in > identifiers because Solaris troff, which is a System V troff, which > I _think_ is descended from DWB 2.0 troff, does so. But it's > hazardous to place high confidence in that conclusion given that the > behavior of no other specimens of a System V troff or any version of > DWB troff other than 3.3, is available or has been reported to me. >
