Hi Alex & Helge,

At 2025-08-25T20:57:22+0200, Alejandro Colomar wrote:
> On Mon, Aug 25, 2025 at 04:17:32PM +0000, Helge Kreutzmann wrote:
> > Am Sun, Aug 24, 2025 at 10:04:04PM +0200 schrieb Alejandro Colomar:
> > > On Sun, Aug 24, 2025 at 02:48:46PM +0000, Helge Kreutzmann wrote:
> > > > Without further ado, the following was found:
> > > > 
> > > > Issue:    The URL is invalid
> > > > 
> > > > "For maximum interoperability, programs and users should also limit the 
> > > > "
> > > > "characters that they use for their own pathnames to characters in the 
> > > > POSIX "
> > > > "E<.UR 
> > > > https://pubs.opengroup.org/\\:onlinepubs/\\:9799919799/\\:basedefs/";
> > > > "\\:V1_chap03.html#tag_03_265> Portable Filename Character Set E<.UE .>"
> > > 
> > > Hi Helge,
> > > 
> > > That URI has '\\:' in it, which is correct in roff(7) (and in man(7))
> > > source code.  That is removed by troff(1) when formatting the page.
> > > If you read the formatted page that's not there.
> > 
> > Yes, then no URL is there :))
> 
> Hmmm, that depends on your terminal.  If there's no URL or hyperlink,
> this might be an issue in either the terminal or groff(1).

I need clarification on what you're seeing, Helge.

The presence or absence of `\:` escape sequences should not make the
entire URL fail to format.  The visibility of the URL is dependent on
the output device's ability to hyperlink it.

groff_man(7):
     .UR uri
     .UE [trailing‐text]
            Identify uri as an RFC 3986 URI hyperlink with the text
            between the two macro calls as the link text.  An argument
            to UE is placed after the link text without intervening
            space.  uri may not be visible in the rendered document if
            hyperlinks are enabled and supported by the output driver.
            If they are not, uri is set in angle brackets after the link
            text and before trailing‐text.  If hyperlinking is enabled
            but there is no link text, uri is formatted and hyperlinked
            without angle brackets.

As far as I can tell, groff man's `UR` and `UE` extension macros were
designed to degrade well on systems that don't implement them; recall
that the man(7) macro language was designed in 1979 and did not
anticipate hypertext.  (mdoc(7), sometimes touted as an alternative, was
designed in about 1990 and had a similar lacuna--but like man(7), later
saw a groff extension to fill the gap.)

Since the link text itself is not in the arguments to a (possibly
undefined) macro, it should get formatted in the page.  A _man_
formatter that doesn't implement `UE` might leave off some trailing text
(usually punctuation), but that too can be worked around portably[1] if
one cares to.

.TH foo 1 2024-08-25 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
Visit
.UR https://my.example.com
my awesome website\c
.if \n(.g \~
.UE \c
\&.

Admittedly, the supply of man page maintainers concerned about
portability to DWB, Solaris 10, or Plan 9 troffs seems to be dwindling.
I've never seen any page go to the foregoing trouble.

> > > The effect of '\\:' is telling troff(1) that those are good points
> > > to break the line if needed.
> > 
> > Thanks for the explanation. Checking the URL after removing the \\:
> > is a valid URL.

It's worth noting that `\:` is also a groff extension; this time to the
formatter, and dating back to about 1990.

     \:        Insert a non‐printing break point.  A word can break at
               such a point, but a hyphen glyph is not written to the
               output if it does.  The remainder of the word is subject
               to hyphenation as normal.  You can use \: and \% in
               combination to control breaking of a file name or URI or
               to permit hyphenation only after certain explicit hyphens
               within a word.  See subsection “Hyperlink macros” above
               for an example.

               \: is a GNU extension also supported by Heirloom Doctools
               troff 050915 (September 2005), mandoc 1.13.1
               (2014‐08‐10), and neatroff (commit 399a4936, 2014‐02‐17),
               but not by DWB, Plan 9, or Solaris 10 troffs.

There's a portability workaround for that, too.  Here's a real-world
example.[2]

I mention these issues because Helge's project intakes a huge variety of
man pages.

Regards,
Branden

[1] except to po4a: https://github.com/mquinson/po4a/issues/527
[2] 
https://github.com/ThomasDickey/ncurses-snapshots/blob/ec918320a42c0dd57c1ea8481419bcaf862d16fd/man/curs_getch.3x#L46
    
https://github.com/ThomasDickey/ncurses-snapshots/blob/ec918320a42c0dd57c1ea8481419bcaf862d16fd/man/curs_getch.3x#L783

Attachment: signature.asc
Description: PGP signature

Reply via email to