On Mon, Mar 09, 2026 at 05:33:50PM +0000, Gavin Smith wrote:
> On Mon, Mar 09, 2026 at 12:41:27PM +0000, Werner LEMBERG wrote:
> >
> > [2051fde55ac67d92c0532c0483a657aef092b6cb]
> >
> >
> > Consider the following input.
> >
> > ```
> > \input texinfo
> >
> > @top Top
> >
> > (@uref{https://foo.bar/baz.html,
> > baz})
> >
> > (@uref{https://foo.bar/baz.html, baz})
> >
> > @bye
> > ```
> >
> > If I process this with `texi2any --html`, I get the following in the
> > output
> >
> > ```
> > <p>(<a class="uref" href="https://foo.bar/baz.html"> baz</a>)
> > </p>
> > <p>(<a class="uref" href="https://foo.bar/baz.html">baz</a>)
> > </p>
> > ```
> >
> > Why is there a difference in whitespace handling? This smells like a
> > bug.
>
> I think they should be both output as the second output, with the whitespace
> skipped on the new line before "baz". I notice they are both output the same
> with texinfo.tex.
>
> In the Info output, in the first usage, a single " " is output before
> the "baz", thus: '( baz (https://foo.bar/baz.html))'.
>
> In the tree output, the output is this:
With -c DUMP_TREE=1, the tree structure is more visible:
*@uref C2 space_in_uref.texi:l5
*brace_arg C1
{https://foo.bar/baz.html}
*brace_arg C1
|INFO
|spaces_before_argument:
|{spaces_before_argument:\n}
{ baz}
The end of line is in spaces_before_argument, but the spaces on the next
line are with baz.
> I expect the leading spaces on the next line should be included inside
> this "before" whitespace, so it should be parsed as:
>
> brace_arg b/\n /
> |baz|
>
> Hopefully Patrice will comment.
I think that it is be the consequence of two rules in the Texinfo tree:
* a new line ends a text element
* there is only one spaces before argument element associated to an
element (here brace_arg).
There could additionally be a difficulty to detect that the spaces after
the new line are continuation of spaces before argument, but it is
probably secondary, as empty line within brace commands are detected.
Something similar happens with (there are spaces after aaa)
(@uref{aaa
, bbb})
the spaces after aaa are with aaa, the spaces before the comma are in
spaces_after_argument:
*@uref C2 space_in_uref.texi:l10
*brace_arg C1
|INFO
|spaces_after_argument:
|{spaces_after_argument: }
{aaa \n}
*brace_arg C1
|INFO
|spaces_before_argument:
|{spaces_before_argument: }
{bbb}
Note that for indicating commands with one argument, the spaces are with
the argument, there is no spaces_before_argument/spaces_after_argument:
@strong{
h
}
*@strong C1 space_in_uref.texi:l13
*brace_container C2
{ \n}
{ h \n}
I do not remember any discussion about that. If there was it would have
when doing the Perl texi2any parser a long time ago.
My feeling is that it would be more consistent, when there is
spaces_before_argument/spaces_after_argument, to have all the spaces
together, though not in the same text element. So
spaces_before_argument and spaces_after_argument should become arrays of
text elements.
--
Pat