Update of bug #64202 (project groff):

                 Summary: [man-pages]: groff_man(7) inconsistently (and
redundantly) guards some .MR references with '\%' => [man pages] groff_man(7)
inconsistently (and redundantly) guards some .MR references with '\%'

    _______________________________________________________

Follow-up Comment #4:

[comment #3 comment #3:]
> [comment #1 comment #1:]
> > Hi Keith,
> > 
> > I'm aware of this.  It's deliberate insofar as it's a consequence of other
decisions.
> > 
> > The main facts are these:
> > 
> > 1.  The new `MR` macro unconditionally prefixes its first argument with a
`\%` escape sequence to suppress hyphenation.
> 
> That's what I thought.  Consequently, there is _absolutely_ no need for
references, such as '.MR \%topic n', to _ever_ add that redundant '\%' prefix
to the topic name.

...and there are no cases of it doing so in the groff tree,


$ git grep 'MR.*\\%' || echo NONE
NONE


so your stridency here is a bit puzzling.
 
> > 2.  All of _groff_'s man pages (.[157]) files are produced in the build
tree from from .man inputs.
> 
> Again, I'm well aware of this, but the '*.man' sources _do not_ specify the
redundant prefix,

Agreed.

> (other than incidentally, via a malformed transform for a '@g@' prefix).

That's not incidental, it's deliberate.
 
> And, therein lies the bug ... for it _is_ a bug.  The intent of '@g@' is to
add a program name prefix -- typically 'g' for GNU programs -- so that 'tbl'
becomes 'gtbl', when appropriate; it has _absolutely no business_ to _ever_
include '\%' as part of that prefix.

Why not?  According to DWB, Heirloom Doctools, and GNU troffs, it's idempotent
when repeated at the beginning of a word.


$ cat EXPERIMENTS/hyphenation-point.roff 
.ll 3n
foo
\%foo
\%\%foo
\%\%\%foo
A \%\%\%foo
AB \%\%\%foo
ABC \%\%\%foo
ABCD \%\%\%foo
.pl \n(plu
$ nroff -Wbreak EXPERIMENTS/hyphenation-point.roff
foo
foo
foo
foo
A
foo
AB
foo
ABC
foo
ABCD
foo


(I suppressed warnings because they're not relevant here; only spurious
hyphens at the start of a word would be, and those would be visible in the
output anyway.  I also tried all three formatters with line lengths of 4n and
5n; this also failed to cause spurious hyphenation.)

Formatters are prepared to handle inputs like this, and so too should macro
packages be, if they want to claim general utility.

> (FWIW, the seat of the bug is within the substitution for '@g@', as it is
specified in the generated Makefile, at the point where 'topic.n' is generated
from 'topic.n.man').

It's done for some other replacements as well.


commit d84d9e1d85287b24d14001a6fdcbaa9cfc588d55
Author: G. Branden Robinson <g.branden.robin...@gmail.com>
Date:   Sun Feb 20 05:21:36 2022 +1100

    Makefile.am: Use hyphenation control escapes more.
    
    * Makefile.am (.man): Prefix hyphenation control escape sequences to
      more configuration-time interpolations to prevent their hyphenation:
      @DEVICE@, @g@, @INDEX_SUFFIX@, @PAGE@, @TMAC_{AN,M,S}_PREFIX@,
      @TMAC_MDIR@.


(That commit message is a little unfortunate.  It should say
"configuration-dependent", not "configuration-time".)

> Understood.  However, the intent of '@g@' should _not_ be subverted, for
this unrelated purpose ... either specify '\%' _explicitly_, in any context
where it is intended, or introduce a specific transform, other than '@g@'
itself, which implies the effect of '\%@g@'.

The purpose is not being subverted.  You said yourself that the "seat" of this
behavior is Makefile rules for generating .[157] from .man.  It would be wrong
to do so in "makevarescape.sed", for instance, because '@g@' and friends get
expanded in contexts other than _roff_ sources.  Moreover, valid _roff_ input
is indeed being produced.

> I think that this is an insidious bug, which should be fixed.

I checked out your attached PDF and it looks quite nice to me.  The problem
with the hyperlinks is clear, and as you described; a stray percent sign is
getting into some of the hyperlink targets you generate.  This is not the
fault of the formatter or the man page sources.  If it were, then the
hyperlinks that groff Git produces would have the same problem.


$ ./build/test-groff -t -rU1 -man -Tutf8 -Z ./build/tmac/groff_man.7 | grep 'x
X' | tail -n 20
x X devtag:.NH 1
x X devtag:.eo.h
x X tty: link man:tbl(1)
x X tty: link
x X tty: link man:eqn(1)
x X tty: link
x X tty: link man:refer(1)
x X tty: link
x X tty: link man:man(1)
x X tty: link
x X tty: link man:groff_mdoc(7)
x X tty: link
x X tty: link man:groff_man_style(7)
x X tty: link
x X tty: link man:groff(7)
x X tty: link
x X tty: link man:groff_char(7)
x X tty: link
x X tty: link man:man(7)
x X tty: link
$ ./build/test-groff -t -man -Thtml -Z ./build/tmac/groff_man.7 | grep 'x X' |
tail -n 25
x X devtag:.br
x X html:<a href="man:tbl(1)">
x X html:</a>
x X html:<a href="man:eqn(1)">
x X html:</a>
x X html:<a href="man:refer(1)">
x X html:</a>
x X devtag:.sp 1
x X devtag:.br
x X html:<a href="man:man(1)">
x X html:</a>
x X devtag:.sp 1
x X devtag:.br
x X html:<a href="man:groff_mdoc(7)">
x X html:</a>
x X devtag:.sp 1
x X devtag:.br
x X html:<a href="man:groff_man_style(7)">
x X html:</a>
x X html:<a href="man:groff(7)">
x X html:</a>
x X html:<a href="man:groff_char(7)">
x X html:</a>
x X html:<a href="man:man(7)">
x X html:</a>


This is why I mentioned the following point in comment #1.

> You do not _need_ to sanitize content destined for device control escape
sequences (or the `device` request) of the `\%` escape sequence.  The
formatter will ignore this escape sequence in that context, skipping over it
without diagnostic, and it will not appear in the "x X" commands that GNU
troff produces.  This is already the case in groff 1.22.4 and therefore I
suspect it's been true for many years.

Are you wrapping or replacing the `MR` macro and "sanitizing" its first
argument for some other purpose?  You said:

> (which, in its present state of development, does not incur any address
sanitizer overhead)

...which I didn't completely understand, as ASAN doesn't seem relevant to the
present discussion of _roff_ macro processing.

Leaving in "Need Info" status, as I'm stuck; I don't agree with your
implication that repeated leading \% escape sequences in a word are invalid
_roff_ input, and I don't have enough insight into the implementation you're
working on to offer advice.  Maybe you could share some of its code.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64202>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/


Reply via email to