Follow-up Comment #15, bug #67372 (group groff): At 2025-07-29T09:29:06-0400, Ingo Schwarze wrote: > [comment #11 comment #11:] > >> Here's a proof-of-concept patch, fixing up only the `\A` escape >> sequence. > > Your patch is absolutely not OK. > In fact, it significantly exacerbates the regression.
I reject your characterization that it's a regression. As I see it, the input you were feeding to GNU troff was syntactically invalid and your test cases dependent on *unspecified* formatter behavior. > Here is what i tested. > 1. I replaced s/want_att_compat/compatible_flag/ in your patch. > 2. I added the resulting modified patch to the OpenBSD port. > 3. The patch, configure, build, fake, and package stages all went > through with no suspicious output. > 4. I installed the modified port. > 5. Nothing changes in the rest of the test suite, which is good as > far as that goes. > > But while the roff/esc/A1 test exposes two regressions in \A handling > in 1.23, with \A otherwise still in working order, with your patch > > 1. These two regressions become significantly worse: > With your patch, groff now prints > string expansion: 0xwordxu > register expansion: 0121u > instead of the expected > string expansion: 1u > register expansion: 1u > which is much worse than without the patch because it now leaks \A > arguments to the output > rather than only losing a single suffix character. > > 2. The scope of the regression expands massively. > In addition to not fixing the problem (and making it worse), > the \A sequence is almost entirely broken with the patch. > With one exception (the final "unterminated sequence" test), > the patch causes every single one of the "Argument delimiter" > tests to fail very badly: > most of them switching value from 1 to 0 *and* leaking arguments > to the output. > > 3. The regression of the first test, the \A\r test, is particularly > dire. That test now causes this output (observe the almost blank > line with the "u", and the ruined section title): > u > Argument delimitersword > unsupported \r: 0 > instead of the expected > > Argument delimiters > unsupported \r: 1u I don't know that your tests are sound given the grammar of GNU _troff_. First I should quote the manual again, though. The following escape sequences don't take arguments and thus are allowed as delimiters: '\<SPC>', '\%', '\|', '\^', '\{', '\}', '\'', '\`', '\-', '\_', '\!', '\?', '\)', '\/', '\,', '\&', '\:', '\~', '\0', '\a', '\c', '\d', '\e', '\E', '\p', '\r', '\t', and '\u'. However, using them this way is discouraged; they can make the input confusing to read. And lest you suspect that the foregoing language is some contrivance of mine, here's the _groff_ 1.22.3 Texinfo manual: ...but it is better not to use this feature to avoid confusion. The following escapes sequences (which are handled similarly to characters since they don't take a parameter) are also allowed as delimiters: '\%', '\ ', '\|', '\^', '\{', '\}', '\'', '\`', '\-', '\_', '\!', '\?', '\)', '\/', '\,', '\&', '\:', '\~', '\0', '\a', '\c', '\d', '\e', '\E', '\p', '\r', '\t', and '\u'. Again, don't use these if possible. How old _is_ this language? A few "git blames" later, I see: a7cac213d3 doc/groff.texinfo (Werner LEMBERG 2000-03-28 09:26:07 +0000 5870) The following escapes sequences (which are handled similarly to a7cac213d3 doc/groff.texinfo (Werner LEMBERG 2000-03-28 09:26:07 +0000 5871) characters since they don't take a parameter) are also allowed as a7cac213d3 doc/groff.texinfo (Werner LEMBERG 2000-03-28 09:26:07 +0000 5872) delimiters: @code{\%}, @w{@samp{\ }}, @code{\|}, @code{\^}, @code{\@{}, a7cac213d3 doc/groff.texinfo (Werner LEMBERG 2000-03-28 09:26:07 +0000 5873) @code{\@}}, @code{\'}, @code{\`}, @code{\-}, @code{\_}, @code{\!}, 6603276718 doc/groff.texinfo (Werner LEMBERG 2007-09-15 08:10:46 +0000 5874) @code{\?}, @code{\)}, @code{\/}, @code{\,}, @code{\&}, @code{\:}, 6603276718 doc/groff.texinfo (Werner LEMBERG 2007-09-15 08:10:46 +0000 5875) @code{\~}, @code{\0}, @code{\a}, @code{\c}, @code{\d}, @code{\e}, 7a267a0726 doc/groff.texinfo (Werner LEMBERG 2008-09-25 07:48:15 +0000 5876) @code{\E}, @code{\p}, @code{\r}, @code{\t}, and @code{\u}. Again, don't 7a267a0726 doc/groff.texinfo (Werner LEMBERG 2008-09-25 07:48:15 +0000 5877) use these if possible. Notice that the foregoing necessarily implies (by set complementation) that *escape sequences that take arguments aren't allowed as delimiters*. Given that, let's have a look. I'll annotate your cases with roff-style comments. $ cat ~/src/CVS/mandoc/regress/roff/esc/A1.in .\" $OpenBSD: A1.in,v 1.1 2022/06/08 13:08:00 schwarze Exp $ .Dd $Mdocdate: June 8 2022 $ .Dt ESC-A1 1 .Os .Sh NAME .Nm esc-A1 .Nd the roff escape A sequence: identifier syntax validation .Sh DESCRIPTION empty: \A'' \" valid as input, delimited contents not valid as an identifier .br letters: \A'word' \" valid as input, delimited contents also valid as identifier .br blank: \A'two words' \" valid as input, delimited contents not valid as an identifier .br ASCII non-letters: \AA!"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`{|}~A \" valid as input, delimited contents also valid as an identifier .br invalid escapes: \A'\\\.\G' \" valid as input, delimited contents not valid as an identifier .Ss Argument delimiters unsupported \er: \A\rword\ru \" valid as input, delimited contents also valid as identifier .br ignored \e&: \A\&word\&u \" valid as input, delimited contents also valid as identifier .br useless \e.: \A\.word.u \" UNSPECIFIED: `\.` is not, technically, an escape sequence .br invalid \eG: \A\GwordGu \" UNSPECIFIED: `\G` is not a defined escape sequence .br special \e-: \A\-word\-u \" valid as input, delimited contents also valid as identifier .br break \ep: \A\pword\pu \" valid as input, delimited contents also valid as identifier .br nospace \ec: \A\cword\cu \" valid as input, delimited contents also valid as identifier .\".br .\"skipchar \ez: \A\zword\zu \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br .ds mystr xwordxu string expansion: \A\*[mystr] \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br .nr myreg 121 register expansion: \A\n[myreg]u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br ignored \eON: \A\O1word\O2u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br special character: \A\(hyword\(hyu \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br ignored \eZ\(aqstr\(aq: \A\Z'foo'word\Z'bar'u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br horizontal motion: \A\h'1'word\h'3'u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br horizontal line: \A\l'4'word\l'2'u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br overstrike: \A\o'ab'word\o'cd'u \" ERROR: escape sequences that take arguments are not valid as escape sequence delimiters .br unterminated: \A'word \" ERROR: unterminated delimited argument[*] [*] GNU _troff_ has, from 1.22.3 to Git HEAD at least, merely warned about this input; what usually happens is that the remainder of the input line becomes the argument to the escape sequence, and the parser state resets upon encountering a newline (or end of input). As far as I know, it's unspecified whether input line continuation is honored in this context, and I can imagine a case being made either way. I advise you to fix your test cases. I've opened bug #67375 to address the underspecification problem. > I have no idea why you are bringing compat mode into the discussion. **BECAUSE GNU TROFF CHANGED THE WAY INTERPOLATIONS WERE HANDLED COMPARED TO AT&T TROFF STARTING OVER 35 YEARS AGO, AND YOUR TEST CASES ARE EXPECTING AT&T-LIKE HANDLING OF THE INPUT.** Unfortunately GNU _troff_'s change to the grammar appears not to have been comprehensively applied. You have found some corner cases and elevated the formatter's treatment of, per its manual, *prohibited input* to the status of a specification, such that you term nonconformance with your expectations as "regression". > This regression has absolutely nothing to do with compat mode. It does, inasmuch as you happen to have crafted your tests to expect AT&T _troff_-like interpolation behavior. > You cannot be suggesting that manual pages should be rendered in > compat mode, or that the mandoc test suite should run in compat mode? Not necessarily. But I will *have* to suggest that remedy if you refuse to come to grips with GNU _troff_'s "interpolation depth" concept. And I kind of do expect you to refuse that, because I fear you'll have to change _mandoc_'s parser in ways you don't want to if you accept that concept. Or, you could absorb what the GNU _troff_ manual has said for 17-25 years and stop employing grammatical constructs that _groff_'s Texinfo manual has, since before _mdocml_[*] was written, said are not accepted. Or you could not be so particular about how the formatter recovers from syntactically invalid input. You might run those cases through `groff -Wall -wdelim` and look for non-empty output to the standard error stream. [*] _mandoc_'s previous name, for those reading this dispute for (dubious) entertainment. > I have never used compat mode for anything and i think its very > existence is a terrible idea. You can propose its removal to the _groff_ mailing list, of course. > Software always changes, and the idea that this continuous evolution > can be captured by a single bit of information is delusional. Which > point in time does compat mode target for compatibility? That's a fair point, but given how imprecisely specified many aspects of *roff are in the first place, a "compatibility mode" that satisfactorily renders historical AT&T documents according to veteran practitioners is, probably, "good enough". > So i had to look up how to even enable compat mode because i never > used or tested it... Putting ".cp 1" as the first line into the test > file changes absolutely nothing. Neither does setting the -C command > line option. > > Putting ".cp 1" after the mdoc(7) preamble (right before the ".Sh > NAME" line results in this printout on standard output, *instead* of > the NAME header: > $] .tm Usage: .Sh section_name ... (#0.c]) c-macro-name Sh. > and then two blank lines, > but all parts of the (exacerbated) regression still occur with no > changes. > > Putting ".cp 1" after the ".Ss Argument delimiters" section header > only prevents the *new* regressions that your patch causes, > but the *original* regression that this ticket is about still occurs. > > To summarize, your patch is ineffective for the purposes of this > ticket and causes new regressions that are even worse. It sounds to me that, like Bjarni's obsession with compiler warnings and similar, you're refusing to consider the _context_ of what it is you're asking the formatter to do because you're regarding an automated test failure as inherently defective, as opposed to being a diagnostic tool. > I think your development methodology is not quite right. > > It started off well enough: > > 1. A regression is reported. > 2. You identified the commit that caused it (thanks for that!). > (I did not yet check whether reverting that commit, or the relevent > part(s) of it, indeed fixes the regression.) > 3. That commit was *not* intended to change behaviour. > Indeed, the commit message says "refactor". > > At this point, the logical next step is to _just fix the bloody > regression_ No, the logical next step is to _understand the behavior change_. That includes a consideration of whether the input was well-formed in the first place. > by reverting the defective commit, or the defective parts of it, > then consider what can be done to reduce the probability of future > refactoring commits causing regressions in a similar way. I explained at length and with citations to the GNU _troff_ manual why your test cases are invalid inputs. > Instead you chose to > > 4. Research historic behaviour ab initio. Yup. No *roff in the world has ever been written for the purpose of passing _mandoc_'s test suite. > 5. The result was (at least for \w; i don't think discussing \b for > nroff(1) makes much sense, \b is perfectly well defined for nroff mode. It stacks glyphs vertically as in troff mode (with coarser device resolution). You're flaunting your ignorance with this statement. (That said, it _is_ a weird feature.) > and i didn't check what's going on for \o yet since the roff/esc/o > test does not indicate any regression) that \w behaviour has been > consistent from AT&T all the way through to groff-1.22.4, which > impressively confirms what we already knew after steps 2 & 3, i.e. > that there is a regression. It does nothing of the sort. > Now instead of fixing the regression, you take off on a breathtaking > tangent about an unrelated topic, namely, compatibility mode. That > doesn't make sense to me. I'm talking about compatibility mode because what you seem to desire and expect is AT&T _troff_-style interpolation semantics, not GNU _troff_-style. But you don't seem to understand that, because your own oxygen-hungry tangent fails to mention, let alone grapple with, the concept of "input level" (a.k.a "interpolation depth") which is fundamental to the operation of GNU _troff_'s parser. The term is right there in the code you're wanting to patch (`input_stack::get_level()`). GNU _troff_ has long had defects in the sense that it hasn't aggressively validated its input stream. Not a surprise--C and C++ developers are not Haskell programmers. Over the years, I've tightened up some of the gaps. But I might not ever make it perfectly rigorous. When not in compatibility mode, the character that follows an escape function selector (that accepts a delimiter) MUST be a valid delimiter _prior_ to application of any interpolation operations to the stream, or the input is invalid. When in compatibility mode, the character that follows the escape function selector can be anything, and will be greedily, recursively interpolated. If what ends up after the escape function selector is a valid delimiter character, the formatter marches merrily on. > Sure, i'm not saying we cannot discuss compatibility mode (even though > i consider compat mode mostly a waste of time and am not particularly > keen on even considering it at all), but when a regression is found, > normal development methodology is: > > 1. Fix the regression by reverting the defective change, or the > defective part of it. Show me a document--a practical document, not a test case--that fails to render correctly due to this behavior change. There's a reason we've gone more than two years without anyone noticing it. > 2. Consider which safeguards are needed to prevent repeat incidents, > in particular in case a regression not only got committed, but > released. Test cases will do. I'm happy to write them in conjunction with this bug fix. Or, rather, a bug fix as _I_ conceive it, which I expect to be more long the lines of my patch in comment #11 than yours in comment #13. > 3. Only after that can other changes to the code in question be > considered, and if those other changes change behaviour (like the > patch you posted here does) that needs a full rationale and a > discussion completely separate from the regression fix. I see that you're a big fan of pipeline stall. > When a regression is found, it is certainly not acceptable to pile > more commits on top without fixing the regression first, and it is > methodologically also not acceptable to discuss other changes in the > context of the regression fix. Your test case(s) exhibited proscribed input. Get off your soapbox. > But can we please focus this ticket exclusively on the regression fix > and discuss any other matters elsewhere? No, because what you identified wasn't a regression. Where did you come up with these test cases? Did you excerpt them from real-world documents? Where are these documents? They look contrived to me--which can be fine, if done with a language grammar ready at hand. Unfortunately, *roff may not even be a decidable language, so no BNF-style grammar for it has ever existed, to the best of my knowledge.[*] Instead we have a manual in English, which you evidently did not consult. [*] I don't actually know for a fact that all grammars expressible in BNF are decidable. > Reacting to regressions by piling more changes on top instead of just > reverting and learning from the mishap is *the* most effective way to > make software unstable, accumulate bugs, and lose control of what one > is doing. You're preaching now. That time would be better spent reading the GNU _troff_ manual to develop a better understanding of the language. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?67372> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature