Hi Gavin,
At 2026-01-11T16:15:25+0000, Gavin Smith wrote:
> On Sun, Jan 11, 2026 at 08:01:29AM -0600, G. Branden Robinson wrote:
> > If info cannot interpret these escape sequences, it should discard
> > them.
> >
> > If info cannot parse these sequences well enough to reliably discard
> > them, it should ask man(1) or {g,n}roff(1) not to generate them.
>
> It is probably easy enough to discard them. We could discard all OSC
> sequences.That would be consistent with the expectations of ECMA-48, I think.[0] I can't find the exact sentence in the 5th edition (1991) of that standard that mandates the discard of unsupported control sequences, but I'm willing to research the issue more deeply if you maintain that info(1) is behaving conformantly with it. > I'd never seen the problem before, but today I ran "info grotty" on > my system and saw the misdisplayed sequences. > > Viewing the manpage for "groff" via info gives very deformed output; > it is practically unusable. You don't just output these sequences > for web URLs, but also use "man:*" URLs for any references to other > manpages. Yes. Those are man page hyperlinks, a new feature of groff 1.23.0.[1] They're supported by many applications, as noted in the "OSC 8 Adoption" link I shared previously, including the gnome-terminal emulator program, and by the less(1) pager, which in recent versions binds key sequences starting with ^O to hyperlink navigation features.[2] That's useful on terminal emulators that don't support OSC 8 (but correctly ignore sequences they don't support), like xterm. > Fortunately, it seems that not too many manpages are generated with > these sequences, except groff's own manpages. I suggest you do not > start outputting these sequences by default for any manpage > cross-references, otherwise there are too many. On the contrary, the plan is for wider adoption. Alejandro Colomon of the Linux man-pages project has been waiting on me for a while to finish submitting a series of patches that would convert the 3,100 or so man pages that project distributes to use of groff man(7)'s `MR` macro, introduced in groff 1.23.0, which enables production of the hyperlinks. > The occasional web URL is probably ok. > > This change to groff output also breaks any other program that would > use the output from "man". It breaks programs that don't correctly support ECMA-48. Unsupported or malformed escape sequences must be discarded, not emitted literally. At 2026-01-11T16:26:57+0000, Gavin Smith wrote: > I should add that Info runs on a wide variety of Unix-like systems, ...as does groff. > not just those using groff or particular versions of "man", so using > particular command-line invocations or setting particular environment > variables is unlikely to be reliable. Only somewhat true, which is one reason I proposed two mechanisms for the "info" program to collect man page text, since I don't know precisely which technique it uses. > For example, you suggested setting MANROFFOPT=-rU0, which looks like > an option to be passed to a "roff" program by "man", but the program > might not recognise that option, and then you might not get a manpage > at all. Your objection is premised on incomplete information. Setting the "MANROFFOPT" environment variable will indeed have no effect with man(1) programs other than man-db man(1). Brouwer/Lucifredi man, formerly used by Red Hat, has been defunct for over 10 years.[3] Other man(1) programs still in use include Solaris's, which runs System V nroff (or troff) on Solaris 10, and an old version of groff on Solaris 11;[4] FreeBSD's "man" shell script; and mandoc(1)'s man program--all of which will ignore it harmlessly, like any other unrecognied environment variable.[5] Second, a "-rU0" command-line option will in fact be recognized by any "roff" program except the one actually called "roff", which to the best of my knowledge last shipped in 2.9BSD in 1983.[6] All lineages of nroff/troff since then support the '-r' command-line option.[7] What '-rU0' does is direct the formatter to assign the register named 'U' the value '0'. In *roff formatters, this is the same result as not specifying it at all, since registers don't have to be declared before use. The formatter automatically assigns registers values of zero if they are dereferenced before being defined. However, on when using groff man(7), this command-line option overrides any existing register assignment, as might be done in the "troffrc" file or a macro package. Since *roffs process command-line string and register definitions before loading macro packages specified with the '-m' option, what groff man(7) actually does is use a GNU troff extension to check if the 'U' register is defined at all; if it is, it must have been at the command line, so it does not override the value the user specified.[8] The net result is the same. Passing '-rU0' to a *roff will not cause a document to fail to render unless it programs itself not to do so in that circumstace. There is a remote possibility that a man page employs a 'U' register for its own purposes, but this is vanishingly unlikely; I've reviewed hundreds of man pages and grepped thousands. The extremely few man(7) authors who define registers at all attempt nothing so dramatic, and they also tend to avoid use of single-letter register names. (The lone exception I'm aware of being the use of an 'F' register to control index entry emission by perlpod, a man(7) document _generator_.) man(7) document authors are generally not sophisticated in their exercise of formatter features, which sometimes frustrates them but also makes man(7) document composition simpler than it would otherwise be. grotty's OSC 8 feature was planned and implemented with substantial consideration, field trials, and user consultation. The root of the problem observed is info(1)'s poor conformance with ECMA-48. If these problems have gone unraised by Texinfo users for a long time, my surmise is that users of info(1), and of GNU Emacs's WoMan man browser, have such low expectations of their rendering that they disregard any formatting errors they see. I observe that, for example, WoMaN, which apparently attempts to parse man(7) document input for itself instead of entrusting it to a man(1) program or to the nroff command, misrenders `\c` and `\:` escape sequences.[9] Regards, Branden [0] http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf [1] https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n223 [2] https://www.greenwoodsoftware.com/less/news.661.html [3] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/migration_planning_guide/chap-red_hat_enterprise_linux-migration_planning_guide-changes_to_packages_functionality_and_support [4] Explicit documentation of these facts seems scarce. With an account, one can confirm this first-hand on gcc210.fsffrance.org and gcc211.fsffrance.org. [5] https://cgit.freebsd.org/src/tree/usr.bin/man/man.sh [6] https://minnie.tuhs.org/cgi-bin/utree.pl?file=2.9BSD/usr/man/cat1/roff.1 You can use the search form on the parent page, <https://minnie.tuhs.org/cgi-bin/utree.pl>, to look for other occurrences of "/roff.1". You will observe that it's missing from most descendants of Seventh Edition Unix; (a) Eighth Edition Unix [1985]; (b) Unix System III [1980]; (c) 3BSD [1980], and the VAX port of Seventh Edition, 32/V [1980]. [7] https://www.tuhs.org/cgi-bin/utree.pl?file=V10/vol2/troff/cstr.54 https://www.troff.org/54.pdf (rendered form) [8] https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/tmac/an.tmac?h=1.23.0#n1495 [9] My installed version of Emacs is pretty old, though (27.1); maybe a newer release has fixed these defects. I would direct the WoMan author/maintainer to the "Portability" section of groff_man_style(7), a joint effort by mandoc(1) maintainer Ingo Schwarze and myself to describe a subset of man(7)+troff that developers of standalone man page formatters should support.
signature.asc
Description: PGP signature
