Re: regressions, gratuitous changes, and improvements; was: bug #67992

G. Branden Robinson Fri, 13 Feb 2026 09:02:51 -0800

Hi Ingo,

I'm sorry it's taken me some time to get back to this; more urgent (but
not more important) things ate up my time.

At 2026-02-05T06:38:17+0100, Ingo Schwarze wrote:
> >>> I need you and Ingo Schwarze to understand that "behavior change"
> >>> is not synonymous with "regression".
> 
> >> This sounds like a straw-man reduction of both our positions.
> 
> > I wish it were.  I _want_ it to be!  Meaning: I want the straw man
> > to be a distortion I don't have to deal with.  The problem I face is
> > that neither of you articulate a limiting principle to what you
> > characterize as "regression" [...]
> 
> Oh, that's fairly easy to answer, though the answer cannot be very
> rigorous.
> 
> First, we need to realize that regressions, gratuitious changes,
> and improvements form a continuum, rather than being rigorously
> distinguishable classes.

I agree.

> First, let's define the extreme cases:
> 
>  * A change in behaviour is a clear regression when it causes
>    idioms to no longer work that are commonly used in code or
>    documents, and the decision to deliberately break compatibilty
>    has not been made based on widespread consensus that there
>    are very strong reasons the change is needed and no less
>    disruptive way can be designed to solve the underlying, very
>    serious problem.
> 
>  * A change is clearly a gratuitious change when the improvement
>    is of a very minor nature, in particular when it causes
>    inconvenience for some users and there is no consensus that
>    it is a clear improvement at all.
> 
>  * A change is clearly an improvement when it is fully backward
>    compatible (meaning that code and documents that worked before
>    still work afterwards) and there is consensus that it provides
>    significant benefit, for example improved security, significantly
>    simplified usage, or very important new functionality that does
>    not bloat the user interface.

I think I can work with all of those definitions.

> Obviously, edge cases exist between regressions and gratuitious
> changes on the one hand and between gratuitious changes and
> improvements on the other hand.  The way to deal with edge cases is to
> build consensus and only change behaviour when consensus is reached.
> When consensus cannot be reached, a solution should be sought that
> addresses the underlying issue as well as possible without causing new
> problems for part of the community.

I agree with this analysis, but think it incomplete.  Most software
projects, outside of tightly controlled embedded systems, exist and
operate within larger systems.  Famously, one of Unix's most
characteristic design features is the "pipe", a system resource (kernel-
managed small memory buffer) that makes easy the construction of general
graphs of communicating programs.  Roughly, the composition property
that you get with function calls in programming languages, you can have
with entire programs' I/O using pipes, without having to administer the
creation and sustainment of temporary files for as long as they're
needed, and the destruction thereof when they no longer are.

> For example, from 1.23 to 1.24, the default in groff_man(7)
> and groff_mdoc(7) was apparently changed from -rLL=78n to -rLL=80n.
> In my book, that is clearly a gratuitious change, not an improvement,
> because the benefit is very minor at best, if there is any benefit
> at all.  I think is is almost impossible to make any strong argument
> in either direction why one is better than the other; there are
> only weak arguments like "let's improve the efficiency of terminal
> real estate usage by 2.5%" or "this might make diff(1)s between
> formatted manuals look ugly in 80n-wide terminal windows" or "coming
> so close to the right margin looks ugly" - so this mostly boils
> down to personal preference.

I disagree.  What other standard Unix text formatting utility either
(a) assumes a default line length other than 72 or 80 columns, or (b)
accepts user configuration of the line length but then deducts 2 from
that value "to be safe"?

The latter is what was going on with groff, as I explained both on this
mailing list[1] and in groff's ChangeLog[2] and NEWS files and in the
corresponding commit log message.[3]

NEWS:

*  The an (man), doc (mdoc), and doc-old (mdoc-old) macro packages have
   changed the default line length when formatting on terminals from 78n
   to 80n.  The latter is a vastly more common device configuration, but
   that line length had been avoided since the groff 1.18 release in
   July 2002 (prior to that, the line length was 65n, as in AT&T nroff),
   for an undocumented reason.  That reason appears to have been the
   interaction of bugs in GNU tbl(1) with an aspect of grotty(1)'s
   design.  Those bugs have been resolved.  A man(1) program can still
   instruct groff to format for any desired line length by setting the
   `LL` register on {g,n,t}roff's command line.

> However, even though none of the options is clearly better than the
> other, *changing* the default causes disruption for some people.

That's true, and is why the change is documented in the "NEWS" file.

> For example, is that change important enough that changing every
> single one of 524 (five hundred and twenty four) desired output
> files in the mandoc test suite is a consequence worth accepting?
> Changing 524 files and making sure the change to each of the files
> is completely correct does cause significant effort, you know.

Okay, please excuse the blunt language here, but...

Why the HELL do you have 524 automated tests that are ALL sensitive to
the DEFAULT LINE LENGTH?

Are they all _INTENDED_ to test the default line length?

Are you testing 524 different configurations OF the line length?  If so,
that's not necessary; there aren't anywhere near 524 unique execution
traces through formatter logic that involve only different values of the
line length.

I find myself alarmed by mandoc(1)'s testing strategy.

When I was taught about software testing in school, a point that the
instructors harped on was checking "boundary conditions"[5] as well as
the "common case(s)" or "middle of the range".  (Sometimes, testing of
"error paths" was mentioned--but not always.)

Moreover, this is not the way most of groff's own 300+ automated test
scripts are written.  And I will concede at the outset that groff's test
scripts, most of which I wrote, are not puritanical examples of ideal
testing.  Most of them attempt to test one feature at a time, but
especially in the case of GNU troff, the formatter itself, the "unit"
tests are really integration tests as well, because they run the "groff"
wrapper program to get at the formatter's functionality.  That strategy
is notably impure and would probably result in point deductions in a
software engineering course, because I'm exposing my "unit test" to
spurious failures if something goes wrong with the groff(1) wrapper
program.  In practice, that potential hasn't proven to be a hindrance.
(Usually.  :-| [4].  Not a case of spurious test failure, but massively
aggravating to me nevertheless.)

It's NOT crazy to have "golden masters", complete pre-rendered documents
stored in a script or "expected output" file.  This is more like what
some software engineering texts call an "acceptance test", and groff has
some of those.

For example:

contrib/mm/tests/letters-format-correctly.sh
contrib/mm/tests/memoranda-format-correctly.sh

What groff's test suite DOESN'T do is try to make 310 (or whatever)
tests all "acceptance tests" as well.

An advantage of organizing automated tests into multiple types with
different responsibilities is avoidance of precisely the problem you're
facing with your 524 failing tests.  524 different tests _shouldn't_ go
wrong if one configurable parameter in the formatter changes its
default.  Ideally, one unit test script fails, and since many documents
under test involve at least two lines of text, probably several
acceptance and integration tests will as well.

Why do you have 524 of those?

> For example, is it worth disrupting the project goals of the
> mandoc project?

No, but if mandoc sampled groff in 2010 or so and resolved to pour
cement over every measurable aspect of groff behavior, whether specified
or not, whether documented or not, whether known to be a bug or not, and
proceeded to write all of its automated tests on such a presumption of
utter fixity in the software system being simulated, then the
mdocml/mandoc project set itself up for failure in this area.

> As you know, the main goal of the mandoc project is compatibility with
> groff.

Yes.  And I'm happy to work with you to facilitate that goal, to discuss
feature changes with you, to develop plans for staging potentially
disruptive changes, and similar.

What I will not do is abandon groff as a living software system or have
Hyrum's Law[6] imposed forcibly upon its developers by another project.
To accept either would be betray my responsibilities as _maintainer_ of
the groff project.  An utterly stagnant system neither has, nor
requires, maintenance.

> As a secondary goal, any software project obviously also wants to be
> compatible with itself.

With other versions of itself, yes.  I agree.  I wouldn't want to have
to rewrite ~310 test scripts with every groff release.  Fortunately,
neither my practices in test script writing nor feature development of
groff threaten such an outcome.

> But such a change makes that impossible: i can no longer
> maintain compatibility with both groff-1.23 and groff-1.24
> and compatibility among mandoc versions.

I'm not happy to hear that.  Apart from ceasing development of groff,
how can I help?

> For example, is it worth confusing innocent bystanders like
> these poor guys in the pod-perldoc project:
> https://github.com/briandfoy/pod-perldoc/pull/79#pullrequestreview-3725604410
> (By the way, thanks to Andrew Fresh (afresh1@) for making me aware of
> that confusion over there.)

That looks like a story that started out unhappy but worked out well,
though I'm tempted to correct the record because I think you let your
passions run away with you and made some distorted claims about groff.

But for the moment I'd like to focus on these words of yours that we can
find in that PR's comment log:

>>> I'm not sure whether you need to test anything at all here.

Indeed.

I'm not sure you need to be testing the default line length in 524
different automated test cases.  Without recalling at how you're
conducting your testing--I've looked only once or twice--I'd suggest
writing looser regexes.  Don't try to match entire lines if entire lines
_aren't what are under test_.

> What exactly is the significant benefit justifying all this
> disruption?

You speak of "all this", but each instance of disruption is justified,
discussed, and documented on a case-by-case basis.

That's as it should be, in my opinion.  That's software development.

You can of course revert commit dcf73d56bff9383635f3acb469235768118f21dc
in your official port of groff 1.24 to OpenBSD.  If you do so, please
document the fact prominently for everyone's benefit.

I know better than to expect no accompanying editorial slant.  ;-)

Maybe, over time, you can revise mandoc(1)'s test scripts to stop being
sensitive to aspects of program behavior that are not under test.  Once
accomplished, the reversion could itself be reverted.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2025-10/msg00076.html
[2] 
https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/ChangeLog?h=1.24.0.rc3#n21730

[3]

commit dcf73d56bff9383635f3acb469235768118f21dc
Author: G. Branden Robinson <[email protected]>
Date:   Mon Jan 1 22:11:42 2024 -0600

    [man,mdoc]: Format for 80n on terminals, not 78n.

    [man,mdoc]: Increase default line length to 80n on terminals.  The
    resolution of tbl(1) bugs such as Savannah #63749, #63640, and #62471
    enables us to discard a decades-old convention of using 78n for the
    default line length when formatting for terminals; this was because
    tbl(1) would sometimes overset lines, with ugly consequences.  (I
    _assume_ this was the reason; no one ever seems to have gone on record
    about it--it was seemingly yet another case of Unix folklore that
    "everybody knew".)  Overset lines are still possible; tbl(1) will warn
    if so.  They can also overset if the document disables filling; the
    author is expected to know what they are doing in that case.

    * tmac/an.tmac:
    * tmac/doc-old.tmac:
    * tmac/mdoc/doc-nroff: Do it.

    * tmac/groff_man.7.man.in:
    * tmac/groff_mdoc.7.man: Update documentation and annotations.

    * tmac/tests/an-ext_SY-and-YS-work.sh:
    * tmac/tests/an_HY-register-works.sh:
    * tmac/tests/an_LL-init-sanely.sh:
    * tmac/tests/an_UE-breaks-before-long-URIs.sh:
    * tmac/tests/an_adjust-link-text-correctly.sh:
    * tmac/tests/an_do-not-abbreviate-escape-using-TH-arguments.sh:
    * tmac/tests/an_title-abbreviation-works.sh:
    * tmac/tests/andoc_flush-between-packages.sh:
    * tmac/tests/doc_indents-correctly.sh:
    * tmac/tests/doc_smoke-test.sh: Update test output expectations.

    * NEWS: Document it.

[4] https://savannah.gnu.org/bugs/?67133

[5] Concretely, checking boundary conditions can catch off-by-one
    errors.

[6] https://www.hyrumslaw.com/

signature.asc
Description: PGP signature

Re: regressions, gratuitous changes, and improvements; was: bug #67992

Reply via email to