Follow-up Comment #15, bug #68420 (group groff): Hi Dave,
At 2026-06-12T00:35:16-0400, Dave wrote:
> Follow-up Comment #14, bug #68420 (group groff):
> [comment #13 comment #13:]
>> But I did not say CSTR #54 was a _specification_. I said that
>> it "did not specify" an aspect of behavior.
>
> That may have been what you _meant_ (and I fully agree with it), but
> what you wrote, which I quoted, was, "In _my opinion_, because the
> behavior is unspecified, no reliable prediction can be made." Not
> "unspecified by CSTR #54," but unspecified full stop, there is no
> possible way to determine what the behavior should be, it could be
> chosen by a random-number generator and still be within spec.
>
>> This road leads to Hyrum's Law.
>
> Sure, any unthinking adherence to some holy work leads to rigid
> ideology. But one could also apply that to "unspecified by CSTR #54 =
> unspecified, period." There are other sources of truth that can guide
> us.
Okay--I concede these philosophical points. :)
>> And your point leaves unaddressed the question of what to do when
>> distinct AT&T troff implementations disagree _with each other_,
>
> True, because my point is not to establish a rigid ideology, but to
> point out that there can be multiple sources of truth, and if they
> disagree with each other, some logic and common sense have to be
> applied.
I concur, and would point to bug #42675 as an illustrative examples.
> Further, I had no evidence that the question Martin posed revealed
> divergent AT&T troff behavior, so addressing it seemed moot for the
> present issue.
Okay, well, let's find out!
I've contrived an exhibit that should tell us how the default no-break
control character is handled when a `.cc '` request is issued to "step
on it". First let's observe what idiomatic input that doesn't clobber
a control character looks like.
$ printf "\\&.foo\\c\n'br\nbar\n'cc\nbaz\n" | nroff | cat -s
.foobar baz
I get identical results with GNU, Solaris 10, Plan 9, Heirloom, and DWB
3.3 troffs.
Now let's get weird.
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | solaris10 nroff | cat -s
.foobar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | dwb nroff | cat -s
.foobar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | heirloom nroff | cat -s
.foobar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | 9 nroff | cat -s
.foobar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-1.22.3/bin/nroff |
cat -s
.foo
bar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-1.22.4/bin/nroff |
cat -s
.foo
bar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-1.23.0/bin/nroff |
cat -s
.foo
bar baz
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-1.24.0/bin/nroff |
cat -s
troff:<standard input>:1: error: ignoring control character change request;
the no-break control character is already "'"
troff: ../src/roff/troff/input.cpp:298: void
assign_control_character_request(): Assertion `assignment_worked' failed.
groff: error: troff: Aborted (core dumped)
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-1.24.1/bin/nroff |
cat -s
troff:<standard input>:1: error: ignoring control character change request;
the no-break control character is already "'"
troff: ../src/roff/troff/input.cpp:298: void
assign_control_character_request(): Assertion `assignment_worked' failed.
groff: error: troff: Aborted (core dumped)
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | ~/groff-HEAD/bin/nroff | cat
-s
troff:<standard input>:1: error: ignoring control character change request;
the no-break control character is already "'"
troff:<standard input>:2: error: an escaped 'c' is not allowed in an
identifier
bar baz
-verbatim-
So all troffs descended from AT&T code handle what recent versions of
the Sendmail manual are doing the same way. Not a big surprise. (I
didn't bother to fire up SIMH and check real Ossanna troff.)
GNU troff handled it differently, even 12 years ago. I'd guess the same
going all the way back to 1989.
So there's been a major troff implementation that didn't handle
"clobbering" replacement of the control character consistently with
others for a decade plus.
GNU troff should not have started core dumping in this situation--that
was definitely a bug and so I fixed it.
Whether GNU troff should alter its own historical behavior to align with
the undocumented behavior of AT&T troffs is a separate question.
In my opinion, we should not disrupt groff users for slavish imitation
of dusty corners of AT&T troff behavior merely for the sake of
cross-troff harmony. Especially not when the corner is so dusty that
it's taken this long to come to light and I feel confident that I can
challenge any comer with a question.
"When you select the default no-break control character as the control
character, does the no-break control character have 'break' or
'no-break' semantics?"
While anyone can have an _opinion_, likely one that blesses the
presumption of their input documents, I don't think anyone can point to
a documentary source that supports their view.
And sure enough, Clark's independent GNU troff implementation made a
different choice in this undocumented/unspecified area.
What does neatroff do? It has no _nroff_ so this looks messier.
First, the model unambiguous input using the dummy character.
$ printf "\\&.foo\\c\n'br\nbar\n'cc\nbaz\n" | neatroff
x T utf
x res 720 1 1
x init
x font 1 R
x font 2 I
x font 3 B
x font 4 BI
x font 5 CR
x font 6 HR
x font 7 HI
x font 8 HB
x font 9 S1
x font 10 S
s10
f1
p1
V0
H720
V120
h0h0c.h25cfh33h-4coh50coh50cbh50cah44crh33h25cbh50cah44h1czh44
V7920
x trailer
x stop
Eyeballing it, that looks like what we'd expect.
$ printf ".cc '\n.foo\\c\n'br\nbar\n'cc\nbaz\n" | neatroff
x T utf
x res 720 1 1
x init
x font 1 R
x font 2 I
x font 3 B
x font 4 BI
x font 5 CR
x font 6 HR
x font 7 HI
x font 8 HB
x font 9 S1
x font 10 S
s10
f1
p1
V0
H720
V120
h0c.h25cfh33h-4coh50coh50
H720
V240
h0cbh50cah44crh33h25cbh50cah44h1czh44
V7920
x trailer
x stop
Hm-HMM! See that honking vertical motion ('V240') amid the glyphs?
So neatroff behaves like groff <= 1.23.
>> I'm not willing to elevate an implementation to specification
>> status.
>
> Well, if the roff language had a specification at all, our lives would
> be easier.
Only in some ways, I think. :-O
> But historically in groff development, AT&T troff behavior has been
> used to fill in gaps in CSTR #54, and sometimes even to overrule CSTR
> #54 altogether (as explored in bug #68366 and bug #64440 issue #2).
> So it goes against this history to claim that "unspecified by CSTR #54
> = unspecified, period."
Yes, sometimes.
> Now, to put all the above in perspective: as a practical matter I
> think this is largely academic for this ticket's issue. Comment #8
> points out that the construction under discussion is nonidiomatic--but
> even if it is used, giving .cc literally any character besides '
> avoids the mess altogether. Trying to set both flavors of control
> character to the same character doesn't give the user any novel
> functionality.
Right.
> But if various AT&T troffs behave consistently with this input, and if
> that behavior is rational, that's a good reason for groff to follow
> suit, despite CSTR #54's silence.
And that's where I disagree. "...if that behavior is rational".
I don't think there is any _rational_ reason to prefer AT&T over
GNU/neatroff semantics for a clobbered no-break control character.
Which semantics one selects is a coin flip.
I remain reluctant to make any deeper change to the formatter here, or
to add a "compatibility mode escape hatch", but as an exercise for the
reader I leave the development of an exhibit exploring the dual of the
foregoing scenario.
What happens with all these troffs when the `c2` request is used
complementarily?
.c2 .
yadda yadda\c
'br
yadda
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?68420>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
