[looping in groff list because I cover a lot of *roff/man history]

At 2026-01-12T17:20:35+0000, Gavin Smith wrote:
> I've attached the file produced on my system by
> 
> $ MAN_KEEP_FORMATTING=1 man groff >groff.out

Thanks for sending Eli what he requested before I could get to it.

At 2026-01-12T18:04:52+0000, Gavin Smith wrote:
> > > For example, you suggested setting MANROFFOPT=-rU0, which looks
> > > like an option to be passed to a "roff" program by "man", but the
> > > program might not recognise that option, and then you might not
> > > get a manpage at all.
> > 
> > Your objection is premised on incomplete information.
> 
> Indeed, as I am not familiar with groff options.  It was a theoretical
> problem.
> 
> As we are stripping out the sequences instead, it is a moot point.

If you don't intend ever to support hyperlink-style navigation in "info"
for non-Info documents, then there may never be a reason to revisit that
decision.

> On Sun, Jan 11, 2026 at 02:32:39PM -0600, G. Branden Robinson wrote:
> > > Fortunately, it seems that not too many manpages are generated
> > > with these sequences, except groff's own manpages.  I suggest you
> > > do not start outputting these sequences by default for any manpage
> > > cross-references, otherwise there are too many.
> > 
> > On the contrary, the plan is for wider adoption.  Alejandro Colomon

My apologies to Alejandro Colomar for butchering his surname.

> > of the Linux man-pages project has been waiting on me for a while to
> > finish submitting a series of patches that would convert the 3,100
> > or so man pages that project distributes to use of groff man(7)'s
> > `MR` macro, introduced in groff 1.23.0, which enables production of
> > the hyperlinks.
> [...]
> 
> > > The occasional web URL is probably ok.
> > > 
> > > This change to groff output also breaks any other program that
> > > would use the output from "man".
> > 
> > It breaks programs that don't correctly support ECMA-48.
> > Unsupported or malformed escape sequences must be discarded, not
> > emitted literally.
> > 
> > grotty's OSC 8 feature was planned and implemented with substantial
> > consideration, field trials, and user consultation.  
> 
> That's a very abstract statement, which requires trust on the part of
> the reader that it refers to something substantial.

Acknowledged; you've thus informed me of the level of trust that you
have in my statement.

> > The root of the problem observed is info(1)'s poor conformance with
> > ECMA-48.
> 
> Regardless of info's comformance with standards, I advise you to pay
> due concern to the effects on compatibility by changes to groff's
> output.

This is a revealing admonition.  Since you've quoted the GNU Coding
Standards at me, I'll quote groff's own documentation to you.

grotty(1):
Examples
     roff systems are best known for formatting man pages.  A man(1)
     librarian program, having located a page, might render it with a
     groff command.
            groff -t -man -Tutf8 /usr/share/man/man1/groff.1
     The librarian will also pipe the output through a pager, which
     might not interpret terminal escape sequences groff emits for
     boldface, underlining, italics, or hyperlinking; see section
     “Limitations” below.

     To process a roff input file using the preprocessors tbl and pic
     and the me macro package in the way to which AT&T troff users were
     accustomed, one would type (or script) a pipeline.

            pic foo.me | tbl | troff -me -Tutf8 | grotty

     Shorten this pipeline to an equivalent command using groff.

            groff -p -t -me -T utf8 foo.me
...

> The "info" program worked well as a manpage viewer, and with your
> changes it won't, until users upgrade to a newer "info" program (which
> hasn't been released yet, and which in any case would take years).

When you say "info" deals with the output of "groff", you're being less
precise than the discussion requires.  "info" handles the output of
"grotty".  What is grotty?

grotty(1):
     grotty - groff output driver for typewriter‐like (terminal) devices

If you want groff to format a document for something that isn't a
terminal device, don't tell it to postprocess the document for output to
a terminal!

If "info" has particular needs for the output of a groff pipeline, I
reckon the best approach is to specify what those needs are, and we can
see if someone's willing to write an output driver for it.  As Texinfo
is a fellow GNU Project, I'd personally take seriously the expression of
such demand.

There's a more challenging alternative: "info" could interpret GNU
troff's (or, for that matter, AT&T troff's) output directly, which would
be the equivalent of implementing a *roff postprocessor internally.  I
don't think that would be a great idea, as it would run counter to the
principle of modular design.

The _least_ challenging alternative is the one "info" has historically
selected, which is to pretend that it's a terminal.

And that's where your objections run into trouble.

> Such loss should be weighed against whatever you hope to gain by the
> change.

The "less" pager (another GNU project) has also been around this block.
What does a pager fundamentally do?  It intercepts the output of a
command that writes as if to a typewriter-like device (roughly
speaking[0]), and reprocesses that output to paginate it.

I trust that it is not controversial to observe that, whatever else
the "info" command does when rendering man pages, it paginates them--it
fits them within a cursor-addressable display, and the user can move
back and forth in the document because all of it is buffered by "info".

In days past, people broadly assumed that Unix commands that didn't link
with the curses library (4BSD, 1980) produced output for Teletype
machines.  Further, they assumed that machines receiving that output
were capable of overstriking, and sometimes even that were capable of
partial line feeds or (more often) reverse line feeds.  That is why
utilities like col(1) and ul(1) exist, and why you find, in the oddest
places, text stream interpreters specially handling backspace
characters.

One of those places was nroff(1), which _never_ output a plain text
stream, but always output tailored to some kind of printing device.  In
Seventh Edition Unix (1979), the default output device was a Teletype
Model 37.

See
<https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/troff.1>,
a combined man page for "troff" and "nroff".

I quote:

$ groff -rLL=72n -rHY=0 -dAD=l -man -T utf8 -P -cbou \
    ~/src/unix/v7/usr/man/man1/troff.1
...
NAME
     troff, nroff - text formatting and typesetting
...
DESCRIPTION
     Troff formats text in the named files for printing on a Graphic
     Systems C/A/T phototypesetter; nroff for typewriter‐like devices.
     Their capabilities are described in the Nroff/Troff user’s manual.
...
     ... The options, which may
     appear in any order so long as they appear before the files, are:
...
     Nroff only

     -Tname
            Prepare output for specified terminal.  Known names are 37
            for the (default) Teletype Corporation Model 37 terminal,
            tn300 for the GE TermiNet 300 (or any terminal without half‐
            line capability), 300S for the DASI‐300S, 300 for the
            DASI‐300, and 450 for the DASI‐450 (Diablo Hyterm).
...

The imporant take away here is that nroff never at any point produced
plain text streams by default (and maybe not ever--I'm not familiar with
the programming interface to the GE TerminiNet 300).  Its output was
_always_ tailored to a terminal device, and used control characters to
achieve formatting by default.

For many years, Unix users generally pretended that standard output was
a Teletype machine and wrote applications on that basis, even when using
video terminals.  And that mostly worked, as long as you didn't try to
do anything crazy like pipe a curses application like Rogue, NetHack, or
vi through a pager.  (Nobody tackled this problem; why would anyone want
to?)  But then, the Linux kernel, a GNU userspace, and the descendants
of BSD Unix showed up on IBM PC-compatible machines with VGA cards.

One of the first things these new users of Teletype-compatible output
wanted was color.  Model 37s didn't have that.  Even XTerm, running in a
frequently color-enabled windowing environment, didn't have that (not at
first, and if you think this story is long, here's another[1]).

Supporting color meant using ECMA-48 (a.k.a. ISO 6429, a.k.a. ANSI X3.64
[withdrawn]) escape sequences, because Model 37 machines were incapable
of printing in color and there was no other mechanism (with any
momentum) for expressing color selection in the output.  People still
wanted to use a Unix terminal driver to manage standard I/O streams
interactively, and the terminal driver was written for and worked with
real terminals, most of which after about 1978 implemented ECMA-48--more
or less.

Maintainers of GNU programs oriented to the command line, who saw no
reason to link their applications with curses (or, more narrowly,
terminfo, or even termcap) to determine which ECMA-48 escape sequences
were safe to use, started hard-coding these escape sequences into the
output.[2]

But that created problems.  The SGR escape sequences of ECMA-48 messed
with the expectations of programs expecting very simple typewriter-style
input (ideally, without even Model 37 features like backspace sequences
or partial or reverse line motions).  So two compromises piled up in
rapid succession: (A) The programs elect not to emit these escape
sequences if the output stream isn't a "TTY", assuming that if it's a
pipe, it won't be; and (B) if that heuristic is unreliable, supply the
user with a `--color={auto,always,never}` option to give them control.

That decision satisfied many, but pushed the problem elsewhere--what if
a consumer of coreutils[3] output wanted the output colorized _and_ to
page it?  I suppose one popular response for a while was, "don't do
that, then".  But that didn't hold, and arguably never should have.

That meant pagers had to adapt.  A long time ago (so long I'm not
looking it up), GNU less augmented its '-r' option, which stripped all
ECMA-48 escape sequences--and possibly some others--with a '-R'
counterpart that passed through a subset of ECMA-48 escape sequences
that its author deemed safe.  Why?  Because users demanded it,
especially those who wanted to paginate colorized output.

I have little confidence that subscribers to bug-texinfo@gnu actually
read all of that, so here are the bottom lines.

1.  If your program interposes itself between the output of a Unix
    command and the terminal device at which it was directed, whether
    you're an AWK script or a pager, your program has to plausibly
    simulate a terminal.  If you don't, your users will be unhappy.
    There's no essential difference between f^Hfo^Hoo^Hob^H_a^H_r and
    SGR or OSC escape sequences here.  Some techniques are simply older
    than others, and all vary in breadth of deployment.

2.  By running groff in a way that asks for terminal output and by
    buffering it for presentation to the user, "info" is functioning
    like a pager, and must therefore behave enough like a terminal to
    preserve output that is intelligible to the user.

Despite similarities in wording, those points are distinct; you could
get away with less of (1) if you stopped doing (2).

> It is a very weak argument for breaking compatibility with other
> programs that the other programs were not standards-conformant.

The _standard_ way to find out what capabilities a terminal has is to
employ the terminfo library.[4]  Unfortunately, no version of nroff has
ever done that.  At one point I was convinced that this was a gravely
lacking feature in groff, but I've become much less alarmed for the
simple reason that the overwhelming majority of consumers of nroff
output do so via the intermediation of a pager.  (And for those who
produce nroff _without_ paging it, its output is sufficiently
well-behaved, ECMA-48 conformant, and above all unambitious enough that
it seldom draws complaint.)

So that's good news and bad news.  The good news is that people using
ECMA-48-aware pagers enjoy the features SGR and OSC sequences can
deliver.  The bad news is that not everybody knows they're in the pager
(or, more accurately "terminal output reprocessing") business.

groff did have a big problem here for a long time; while "grotty", the
program that transforms GNU troff's output into terminal output, has
supported options to minutely control its output to select either
ECMA-48-style or "legacy" (overstriking) style output, and even to
switch individual features like reverse video and italics on or off, the
GNU version of the nroff(1) program did not support groff(1)'s `-P`
option to pass those options to the postprocessor!

That wasn't a fatal defect, because a program could always just run
groff(1) itself, state which terminal character encoding ("ascii",
"latin1", "utf8") it wanted, and carry on (as man-db man did).  But it
was troublesome to veterans of Unix and/or AT&T troff and it made a
murky issue even foggier and more frustrating.

I fixed that problem in groff 1.23.0, released 7 July 2023.[5]

You can now simply run nroff, to format a man page or any other *roff
document, and tell it to disable _all_ features that produce anything
but plain text.

Here's the recipe:

nroff -P -cbou

The grotty(1) man page explains the meaning of the `-cbou` options.

> > The GNU Project regards standards published by other organizations
> > as suggestions, not orders. We consider those standards, but we do
> > not “obey” them. In developing a GNU program, you should implement
> > an outside standard’s specifications when that makes the GNU system
> > better overall in an objective sense. When it doesn’t, you
> > shouldn’t.
> 
> https://www.gnu.org/prep/standards/html_node/Non_002dGNU-Standards.html
> 
> As I said, it will likely not only be the "info" program that is
> affected.
> 
> In any case: you would not expect "info" to be ECMA-48 compliant, as
> that is a standard for text terminals, and "info" is not a text
> terminal.  Therefore we would not expect it to be able to interpret
> arbitrary terminal control sequences in input files.  

As noted above, your argument is ill-premised.  "info" is reprocessing
output intended for terminals and has consequently assumed the
responsibility of acting, to some extent, like a terminal.

> Nobody at any point in the history of the development of the program
> has ever gone through the ECMA-48 standard with an aim to comply with
> it.

Doubtless due to the misconception above.  If "info" maintains the
premise that the output of "nroff" is a plain text stream, then the
presence of code that interprets backslash+other-character sequences is
inexplicable.  Moreover, what do you when, given the sequence a^Hb, b is
neither identical to a (set bold attribute), nor '_' (set underline
attribute)?  On a Model 37, this had familiar and well-defined
semantics; in fact, there weren't three cases but one: you overstrike.
But character-cell video terminals generally don't support constructive
overstriking.

The premise that "nroff" produces a plain text stream is as
unsustainable now as it was 45 years ago.

On the bright side, groff can accommodate you.

> As you probably understand, "info" runs "man" to get textual output,
> which it then displays.  The output of "man" is thought of as a text
> file, rather than raw bytes to be sent to the terminal.

If you want a _plain_ text file from "man", you need to ask for that.
That is not its default, nor even possible in any traditional Unix
implementation any implementation I'm aware of.  On Seventh Edition Unix
(1979), when you run "man", you get overstriking sequences for boldface
and underlining.  Every "man" program I've ever seen supports those
(mainly because they either implement them (mandoc(1)) or hand off
rendering to nroff (every other implementation), which does so itself.

> (Cursor movement sequences, for example, would be inappropriate to
> pass through to the terminal, as these would interfere with info's
> display routines.)

I agree.  That's because "info" is, among other things, a pager.

> This textual outupt can contain some control codes for marking text
> with styles like bold or underline ("SGR" codes), which Info
> recognizes.

Right.  What defines "SGR codes"?  ECMA-48.

> I expect it is likely that other programs reading the output of "man"
> would be likewise limited to supporting those sequences which are
> likely to occur.

Yes, and _if_ they follow the standard's rules about how escape
sequences are formed, they'll implement a small state machine that
interprets desired sequences _and discards the rest_.

> (It's possible that versions of "info" older than about 2022 would
> be unaffected, as we had to make a change to set MAN_KEEP_FORMATTING=1
> (on 2022-03-05) to get bold and underlining in manpages, but I haven't
> tested older versions.)

It's possible that the "man" implementation you, or a relevant commiter,
used, itself switched from asking grotty (via groff) to supply
overstriking sequences for style changes to SGR escape sequences.  With
more information, I might be able to find out for you.

> > If these problems have gone unraised by Texinfo users for a long
> > time, my surmise is that users of info(1), and of GNU Emacs's WoMan
> > man browser, have such low expectations of their rendering that they
> > disregard any formatting errors they see.
> 
> Users of "info" do not use the program to view content with arbitrary
> terminal control sequences.

No, but they do use it to view content with _ECMA-48_ escape sequences,
a standard that affords extensions and distinguishes well-formed escape
sequences from ill-formed ones.

It's "info"'s responsibiity to make a compatible distinction if you
don't want unhappy users.

> The program is limited to viewing Info files (the contents of which
> are predictable and do not contain such sequences), and manpages,
> which also have been limited to containing SGR sequences.

Yes, starting with groff 1.18 (July 2002).[6]

> So the situation you describe of "info" users viewing distorted output
> and being happy with it does not have any reality.

I just observed it myself, hence my bug report.

Of which, I admit, I am now beginning to feel repentant.

> I suggest you consider a slower roll-out of this feature to minimise
> compatability issues.

grotty's OSC 8 support was released with groff 1.23.0 in July 2023.[7]

> Use of SGR sequences in grotty output (which info now benefits from)
> was a similar situation historically.

Yes.

> Previously, groff output "overstrike" sequences for bold and
> underline, but switched to SGR sequences at some point.

Yes, starting with groff 1.18 (July 2002).[6]

> As I remember, this was disabled by distributions (Debian) for many
> years due to the potential for backwards incompatibility.

Backwards (in)compatibility with what, though?  Not terminals or
terminal emulators.

With _pagers_.

> Eventually though, other programs caught up and it was not so harmful
> to make the switch and get the benefits of the new functionality.

Right.  And the same thing is happening here.  That groff 1.23.0 has
been out for 2½ years and I'm apparently the first person to bring the
issue to this lists' attention tells me that the population of people
using a groff of anything like recent vintage and also employ "info" to
view man pages is pretty small.  What conclusion would you reach?

As "info" is already a hypertext system, it'd be nice if it could use
hyperlink information that man pages rendered by groff 1.23.0 can offer.
That would necessitate interpreting OSC 8 sequences instead of
discarding them.

I'm curious to learn of your interest in such a feature.

Regards,
Branden

[0] By "typewriter-like" I mean not "like a Teletype Corporation Model
    37", the default output device for Unix/AT&T nroff, but more like "a
    device that organizes its output into character cells and either
    does not support or does not employ arbitrary cursor positioning."
    This definition is a poor description of both the Model 37 and of
    any video terminal, but is a better fit for the small set of
    features that it had in common with the DEC VT 100 video terminal.
    This small set has become the lingua franca that "plain text
    streams" in Unix systems express.  In practice, all of these
    "typewriter-like" devices support extensions to this least common
    denominator; the manner of their expression and the expectation
    thereof is the crux of the disagreement in this discussion.

[1] https://invisible-island.net/xterm/xterm.faq.html#xterm_terminfo
[2] 
https://cgit.git.savannah.gnu.org/cgit/coreutils.git/commit/?id=c65e1fe89f81eaf82ecbff92efbc924cdca541cf
[3] back then: fileutils, shellutils, and texutils
[4] https://pubs.opengroup.org/onlinepubs/9699909599/toc.pdf
[5] https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n86
[6] https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n2210
[7] https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n548

Attachment: signature.asc
Description: PGP signature

Reply via email to