Re: Hyphens in man pages

2023-10-24 Thread Paul Wise
On Tue, 2023-10-24 at 05:40 +, Tobias Frost wrote:
> Am 24. Oktober 2023 03:43:29 UTC schrieb Paul Wise:
> 
> > BTW: as a Debian member, you have access to a gratis subscription:
> > 
> > https://wiki.debian.org/MemberBenefits#LWN
> 
> AFAIK this is no longer available.

I am still using the Debian LWN subscription myself.

See these mails in debian-private archives for details:

<5410c39b-3a13-a4d8-ac12-145c1e584...@debian.org>


-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: Hyphens in man pages

2023-10-24 Thread Tobias Frost
Am 24. Oktober 2023 03:43:29 UTC schrieb Paul Wise :

>BTW: as a Debian member, you have access to a gratis subscription:
>
>https://wiki.debian.org/MemberBenefits#LWN

AFAIK this is no longer available.



Re: Hyphens in man pages

2023-10-23 Thread G. Branden Robinson
Hi Jon,

At 2023-10-23T17:24:30-0600, Jonathan Corbet wrote:
> When I wrote the article (last week) it was a lot closer :)
> 
> As for why: Debian may have resolved this, for now at least, but this
> is an issue that is sure to crop up in distributions that are not as
> quick to pick up new groff releases.  In that sense, the topic is very
> much timely.

Interestingly (perhaps), this wasn't the first thing that caught fire
after the groff 1.23.0 release.  What was, was a hack that Debian took
out of its man.local file to force SGR escape sequences off in
grotty(1).  But other distributions, like Arch and its many derivatives,
still had it and noticed when it quit working, because they also use the
less(1) termcap_XXX environment variable hack (until relatively
recently, undocumented) to convert overstriking character sequences
into...SGR escape sequences for selecting colors.

That was
.

It took nearly four months for a hyphen-minus debacle to flare up.
Another two and I would have decided I'd managed to duck controversy.

> And as for the groff release ... I sure wish we could find another
> writer to hire to help us restore some of our wider community coverage.
> If you know any likely candidates, do send them our way.

Could you update a URL in the article to be more robust?

https://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS#n64

But this points to the master branch, so the text at that line number
will not be stable over time, and in fact it's not correct for the
"PROBLEMS" file in the groff 1.23.0 release.  (We managed to fix some
problems afterward.)

This URL would be better:

https://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS?h=1.23.0#n82

I also shifted the line address a little to make it more obvious how old
this issue really is.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-23 Thread Paul Wise
On Mon, 2023-10-23 at 11:17 -0500, G. Branden Robinson wrote:

> https://lwn.net/Articles/947941/
> 
> Would someone be willing to send me a subscriber-sponsored link to it?

BTW: as a Debian member, you have access to a gratis subscription:

https://wiki.debian.org/MemberBenefits#LWN

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: Hyphens in man pages

2023-10-23 Thread Jonathan Corbet
"G. Branden Robinson"  writes:

> I am disappointed that LWN did not cover the groff 1.23.0 release, its
> first in four years and featuring over 400 bug fixes,[1] even as a tiny
> blurb on the back page, but seizes upon this issue, which quietly died
> down a week ago, and should therefore have aged out of the current
> window of "weekly" news.

When I wrote the article (last week) it was a lot closer :)

As for why: Debian may have resolved this, for now at least, but this is
an issue that is sure to crop up in distributions that are not as quick
to pick up new groff releases.  In that sense, the topic is very much
timely.

And as for the groff release ... I sure wish we could find another
writer to hire to help us restore some of our wider community coverage.
If you know any likely candidates, do send them our way.

Thanks,

jon



Re: Hyphens in man pages

2023-10-23 Thread G. Branden Robinson
At 2023-10-23T11:17:07-0500, G. Branden Robinson wrote:
> Would someone be willing to send me a subscriber-sponsored link to it?

Thanks to the three fleet-fingered folks who supplied me with one.  I am
amply equipped to resume my crusade against ignorance and
misinformation...except...

Now that I have undertaken to attempt to shed light on a comment to the
article, of course LWN itself goes down.[1]

Because of course it did.

Regards,
Branden

[1] https://downforeveryoneorjustme.com/lwn.net

"It's not just you! lwn.net is down.

Last updated: Oct 23, 2023, 11:36 AM (1 second ago)"


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-23 Thread G. Branden Robinson
At 2023-10-14T20:51:27-0600, Antonio Russo wrote:
> I discovered a new pet peeve today:

I must report with some dismay that this thread made LWN.

https://lwn.net/Articles/947941/

Would someone be willing to send me a subscriber-sponsored link to it?

I intend to attempt to address what I expect to be inevitable incorrect
statements about groff and/or man pages in the attached discussion
thread.

I am disappointed that LWN did not cover the groff 1.23.0 release, its
first in four years and featuring over 400 bug fixes,[1] even as a tiny
blurb on the back page, but seizes upon this issue, which quietly died
down a week ago, and should therefore have aged out of the current
window of "weekly" news.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/info-gnu/2023-07/msg1.html


signature.asc
Description: PGP signature


Re: Bug#1041731: Hyphens in man pages

2023-10-16 Thread Gioele Barabucci

On 16/10/23 08:36, Gard Spreemann wrote:

I've also found scdoc to be a quite pleasant and very lightweight alternative


I've noticed that the bullet points in man pages produced by scdoc are 
"off" compared to those produced by "pod2man". For example compare


https://manpages.debian.org/unstable/scdoc/scdoc.5.en.html#LISTS

to

https://manpages.debian.org/unstable/debhelper/debhelper.7.en.html#Substitutions_in_debhelper_config_files

The scdoc-produced lists are typeset with small dots, and have no spaces 
between the bullet points and the text. The pod2man lists have instead 
large dots and contain a space between the bullet points and the text.


Perhaps is it just a bug in the man-to-html conversion? Or is scdoc 
producing wrong roff markup?


Regards,

--
Gioele Barabucci



Re: Bug#1041731: Hyphens in man pages

2023-10-16 Thread Colin Watson
My plan, as indicated in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041731#62, had been
to leave things much as they are for most of the period while trixie is
in development, and then put the ".char - \-" etc. workarounds back in
place for nroff output for trixie's release; this would have meant a
higher chance of more manual page authoring tools being updated to
handle the groff input language more strictly (although this isn't
always easy, as Russ has indicated, since sometimes the input languages
of those tools are less rich than groff).

However, after wading through an enormous amount of inordinately verbose
stuff in my inbox about this, I'm afraid I've now lost patience with the
whole thing and am definitely not willing to put up with it for another
year or more, so I'm putting the workaround back in place now.  Sorry to
anyone who will end up dissatisfied by non-terminal printed output as a
result.

  https://salsa.debian.org/debian/groff/-/commit/d5394c68d7

It is still true that being strict about the use of the "\-", "\[aq]",
"\[ga]", "\[ha]", and "\[ti]" escape sequences (as opposed to "-", "'",
"`", "^", and "~" respectively) will produce better printed output.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: Bug#1041731: Hyphens in man pages

2023-10-16 Thread Gard Spreemann
On October 16, 2023 2:41:08 AM GMT+02:00, "Trent W. Buck"  
wrote:
>FWIW, there are lighter alternatives than pandoc:
>
>pandoc:After this operation, 174 MB of 
> additional disk space will be used.
>sphinx-doc (sphinx-build -b man):  After this operation, 140 MB of 
> additional disk space will be used.
>rst2man (python3-docutils):After this operation, 37.6 MB of 
> additional disk space will be used.
>pod2man (perl):perl is already the newest version 
> (5.36.0-9).
>
>I'm not going to bother measuring docbook ;-)

I've also found scdoc to be a quite pleasant and very lightweight alternative: 
https://tracker.debian.org/pkg/scdoc

--  Gard



Re: Bug#1041731: Hyphens in man pages

2023-10-15 Thread G. Branden Robinson
At 2023-10-15T13:11:47-0700, Russ Allbery wrote:
> Sorry for my original message, which was very poorly worded and
> probably incredibly confusing.  Let me try to make less of a hash of
> it.  I think what I'm proposing is something like:

My reply to this didn't make it to the -devel list even after several
hours; I suppose it was blocked due to my inclusion of a PDF attachment.

Those who are curious about the issue can read it at
.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Bug#1041731: Hyphens in man pages

2023-10-15 Thread Trent W. Buck
On Sun 15 Oct 2023 17:33:07 +0200, Iustin Pop wrote:
> At least you're not lazy. I am, so what I did many times is add a
> build-depends on pandoc, and write the man page in rst or md. I think
> that's a worse solution (pandoc is really heavy), but at least, I don't
> have to go back to *roff.

FWIW, there are lighter alternatives than pandoc:

pandoc:After this operation, 174 MB of 
additional disk space will be used.
sphinx-doc (sphinx-build -b man):  After this operation, 140 MB of 
additional disk space will be used.
rst2man (python3-docutils):After this operation, 37.6 MB of 
additional disk space will be used.
pod2man (perl):perl is already the newest version 
(5.36.0-9).

I'm not going to bother measuring docbook ;-)

If you are writing manpages by hand, this is an excellent overview:

https://manpages.debian.org/bookworm/manpages/man.7.en.html

See also:

https://www.oreilly.com/library/view/mastering-perl/9780596527242/ch15.html 
(POD)
https://www.docutils.org/docs/user/manpage.html#todo-open-issues


signature.asc
Description: PGP signature


Re: Bug#1041731: Hyphens in man pages

2023-10-15 Thread Russ Allbery
"G. Branden Robinson"  writes:

> How about this?

>  \- Minus sign.  \- produces the basic Latin hyphen‐minus
> specifying Unix command‐line options and frequently used in
> file names.  “-” is a hyphen in roff; some output devices
> replace it with U+2010 (hyphen) or similar.

Sorry for my original message, which was very poorly worded and probably
incredibly confusing.  Let me try to make less of a hash of it.  I think
what I'm proposing is something like:

\-   Basic Latin hyphen­minus (U+002D) or ASCII hyphen.  This is the
 character used for Unix command­line options and frequently in file
 names.  It is non-breaking; roff will not wrap lines at this
 character.  "-" (without the "\") is a true hyphen in roff, which is
 a different character; some output devices replace it with U+2010
 (hyphen) or similar.

What I was trying to get at but didn't express very well was to include
the specific Unicode code point and to avoid the term "minus sign" because
this character is not a minus sign in typography at all (although it is
used that way in code).  A minus sign is U+2212 and looks substantially
different because it is designed to match the appearance of the plus sign.
(For example, the line is often at a different height.)  I don't know if
*roff has a way of producing that character apart from providing it as
Unicode.

The above also explicitly says that it's non-breaking (I believe that's
the case, although please tell me if I got that wrong) and is more
(perhaps excessively) explicit about distinguishing it from "-" because of
all the confusion about this.

-- 
Russ Allbery (r...@debian.org)  



Re: Bug#1041731: Hyphens in man pages

2023-10-15 Thread G. Branden Robinson
Hi Russ,

At 2023-10-15T12:06:14-0700, Russ Allbery wrote:
> Minor point, but since you posted it

No worries!

> "G. Branden Robinson"  writes:
> 
> > ...
> 
> >  \- Minus sign or basic Latin hyphen‐minus.  \- produces the
> > Unix command‐line option dash in the output.  “-” is a
> > hyphen in the roff language; some output devices replace it
> > with U+2010 (hyphen) or similar.
> 
> The official name of "the Unix command-line option dash" is the
> hyphen-minus character (U+002D).  Given how much confusion there is
> about this, and particularly given how ambiguous the word "dash" is in
> typography (the hyphen-minus is one of 25 dashes in Unicode), you may
> want to say that explicitly in addition to saying that it's the
> character used in UNIX command-line options (and, arguably as
> importantly, in UNIX command names).

How about this?

 \- Minus sign.  \- produces the basic Latin hyphen‐minus
specifying Unix command‐line options and frequently used in
file names.  “-” is a hyphen in roff; some output devices
replace it with U+2010 (hyphen) or similar.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-15 Thread G. Branden Robinson
At 2023-10-15T10:01:20-0700, Russ Allbery wrote:
> I think my position at this point as pod2man maintainer (not yet
> implemented in podlators) is that every occurrence of - in POD source
> will be translated into \-, rather than using the current heuristics,
> and people who meant to use ‐ should type it directly in the POD
> source.  pod2man now supports Unicode fairly well and will pass that
> along to *roff, which presumably will do the right thing with it after
> character set translation.

It will, as long as something (like preconv(1)) translates the UTF-8
into something GNU troff can understand.  One of the most painful
decisions James Clark made was to follow AT's example and use "char"
as the fundamental character type, instead of throwing his elbows with
an "int" (or better yet, an int-sized C++ type, since C++ had real type
checking in 1989, while K C was still in vogue and scoffed at such
gratuities).[1]  I took a stab at changing this about 3 years ago but
it was too big a bite.  I didn't know enough yet about how the formatter
worked.  If I have n months to set aside I suspect I can get it done on
a second attempt.

Anyway, to illustrate.  (UTF-8 follows.)

$ for n in $(seq 8); do printf 'abc\\[u2010]defgh '; done | nroff | cat -s
abc‐defgh  abc‐defgh abc‐defgh abc‐defgh abc‐defgh abc‐defgh abc‐
defgh abc‐defgh


> Currently, pod2man uses an extensive set of heuristics, but I think
> this is a lost cause.  I cannot think of any heuristic that will
> understand that the - in apt-get should be U+002D (so that one can
> search for the command as it is typed), but the - in apt-like should
> be apt­like, since this is an English hyphenated expression talking
> about programs that are similar to apt.  This is simply not
> information that POD has available to it unless the user writing the
> document uses Unicode hyphens.

Yes.  This is the same point I was trying to make with my mg(1) man
page example.

> I believe the primary formatting degredation will be for very long
> hyphenated phrases like super-long-adjectival-phrase-intended-as-a-
> joke, because *roff will now not break on those hyphens that have been
> turned into \-.  People will have to rewrite them using proper Unicode
> hyphens to get proper formatting.

Even that can be overcome.  You can tell groff that a line can be broken
after a minus sign.  But I'm going to stone-facedly require people to
RTFM for that.  The character remapping in the PROBLEMS file is the
prescribed band-aid for those who can't or don't care to fix bad
typography in man pages, and I'd prefer not to see additional cargo cult
techniques piled on top of it.

https://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS?h=1.23.0#n82

Regards,
Branden

[1] Just like the omission of bounds checks on array types.  What a
brilliant efficiency that was.  Jean Ichbiah saw Dennis Ritchie
coming a mile away in the 1970s, and Ada 83 did the right thing--in
countless respects.  Compiler authors squealed like pigs in hot oil
at the idea of doing any amount of static analysis of input--this is
back when compilers would not _automatically_ pass anything in
registers at all (_everything_ hit the stack) and common
subexpression elimination was regarded as a state-of-the-art
optimization--and spent over a decade slandering Ada's name in every
forum available to them.  Nowadays, static analysis is cool and
compiler engineers make big, big bucks developing its techniques
professionally.  And I'll bet you those who have even heard of Ada
still turn their noses up at it.

Stick around, and I'll share the secret legacy of the hated IA-64...


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-15 Thread Gioele Barabucci

On 15/10/23 19:13, Johannes Schauer Marin Rodrigues wrote:

Quoting Gioele Barabucci (2023-10-15 17:59:32)

On 15/10/23 17:33, Iustin Pop wrote:

At least you're not lazy. I am, so what I did many times is add a
build-depends on pandoc, and write the man page in rst or md. I think
that's a worse solution (pandoc is really heavy), but at least, I don't
have to go back to *roff.


Another option for the members of the lazy club is `podlators-perl`.


that is a virtual package provided by perl. What mechanism exactly are you
referring to when mentioning podlators-perl?


I was referring to `pod2man`, provided by the `podlators` Perl module.

`podlators-perl` is currently provided by `perl` but once upon a time it 
was a standalone package. Old habits. :)


--
Gioele Barabucci



Re: Hyphens in man pages

2023-10-15 Thread Russ Allbery
Minor point, but since you posted it

"G. Branden Robinson"  writes:

> ...

>  \- Minus sign or basic Latin hyphen‐minus.  \- produces the
> Unix command‐line option dash in the output.  “-” is a
> hyphen in the roff language; some output devices replace it
> with U+2010 (hyphen) or similar.

The official name of "the Unix command-line option dash" is the
hyphen-minus character (U+002D).  Given how much confusion there is about
this, and particularly given how ambiguous the word "dash" is in
typography (the hyphen-minus is one of 25 dashes in Unicode), you may want
to say that explicitly in addition to saying that it's the character used
in UNIX command-line options (and, arguably as importantly, in UNIX
command names).

-- 
Russ Allbery (r...@debian.org)  



Re: Hyphens in man pages

2023-10-15 Thread G. Branden Robinson
Hi Wookey,

At 2023-10-15T16:08:32+0100, Wookey wrote:
> OK. So I read all that, and learned a whole load of stuff I was quite
> happy not knowing about.
> 
> However despite reading it all, and especially this bit:
> > Whenever I've maintained man pages in roff I tend to be precise in
> > the usage of - and \-, but TBH this has seemed like a lost battle,
> 
> I was left not actually know what - and \- represent, nor which one I
> _should_ be using in my man pages. And that seems to be the one thing
> we should be telling the 'average maintainer'.
> 
> I think you can consider me representative of the typical maintainer
> who's intereaction with *roff languages almost entirely takes the
> form: 'Oh bloody hell I really ought to write a man page for this
> because upstream is too youthful to have done so - now how the hell
> does roff/nroff/groff work again' (no I'm not sure which it is I'm
> actually using, nor how any of this machinery really works, nor where
> to look for good practice, so I mostly copy existing stuff and DDG for
> answers, which is less than ideal when it comes to details like this).
> 
> So this message is mostly a reminder that most people have not been
> following along at all, so just referring people to bugs like this,
> which discuss the issue in some detail, is not sufficient for
> maintainers to stop doign unhelpful things.
> 
> (Yes I realise I could look it up, but I get the impression that
> everyone involved in this discusssion assumes people know what '-' and
> '\-' are so if they are just told to 'use the right one' will do so,
> and I thought it worth pointing out that that's not correct). Info for
> your average maintainer needs to go one step back and say "use stringA
> in this circumstance and stringB in this circumstance.  what they represent>. The reason why it matters is: stuff about hyphen
> and minus being different and minus being used in commands and
> cut+pasting being important"

Yes, I appreciate your popping of the context stack.

Andreas and Russ provided good, quick answers.  One can reasonably
wonder where to find the same answer in groff's documentation.

Subsection "Fundamental character set" of the groff_char(7) man page
covers the matter, but like the bug report we've Cced, it goes into
great detail.

Subsection "Portability" of groff_man_style(7) (or groff_man(7) in groff
1.22.4) covers the subject in a more practical, how-to manner.

[UTF-8 follows.]

groff_man_style(7):
 ... Some escape sequences
 are however required for correct typesetting even in man pages and
 usually do not cause portability problems.

 Several of these render glyphs corresponding to punctuation code
 points in the Unicode basic Latin range (U+–U+007F) that are
 handled specially in roff input; the escape sequences below must be
 used to render them correctly and portably when documenting
 material that uses them syntactically—namely, any of the set ' - \
 ^ ` ~ (apostrophe, dash or minus, backslash, caret, grave accent,
 tilde).

...

 \- Minus sign or basic Latin hyphen‐minus.  \- produces the
Unix command‐line option dash in the output.  “-” is a
hyphen in the roff language; some output devices replace it
with U+2010 (hyphen) or similar.

 \(aq   Basic Latin neutral apostrophe.  Some output devices format
“'” as a right single quotation mark.

...

 \(ga   Basic Latin grave accent.  Some output devices format “`” as
a left single quotation mark.

 \(ha   Basic Latin circumflex accent (“hat”).  Some output devices
format “^” as U+02C6 (modifier letter circumflex accent) or
similar.

 \(rs   Reverse solidus (backslash).  The backslash is the default
escape character in the roff language, so it does not
represent itself in output.  Also see \e above.

 \(ti   Basic Latin tilde.  Some output devices format “~” as U+02DC
(small tilde) or similar.

> Hope that's helpful.

I hope this message goes some way toward relieving your frustration.

Regards,
Branden


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-15 Thread Johannes Schauer Marin Rodrigues
Hi,

Quoting Gioele Barabucci (2023-10-15 17:59:32)
> On 15/10/23 17:33, Iustin Pop wrote:
> > At least you're not lazy. I am, so what I did many times is add a
> > build-depends on pandoc, and write the man page in rst or md. I think
> > that's a worse solution (pandoc is really heavy), but at least, I don't
> > have to go back to *roff.
> 
> Another option for the members of the lazy club is `podlators-perl`.

that is a virtual package provided by perl. What mechanism exactly are you
referring to when mentioning podlators-perl?

> The `.pod` syntax is OK, and it is not as heavy a dependency as pandoc.

When I have to write a new man page I always write POD instead of troff and
then run pod2man on it. I have yet to find something I wanted to put in a man
page that I was unable to express via the POD format.

Another reason I'm a fan of pod2man is, that it's possible to embed POD
documentation into scripts that are not Perl. For example debvm does this and
even though the debvm tools are written in POSIX shell, pod2man is doing the
right thing:

pod2man /usr/bin/debvm-run | man -l -

Thanks!

cheers, josch

signature.asc
Description: signature


Re: Hyphens in man pages

2023-10-15 Thread Russ Allbery
Wookey  writes:

> I was left not actually know what - and \- represent, nor which one I
> _should_ be using in my man pages. And that seems to be the one thing we
> should be telling the 'average maintainer'.

- turns into a real hyphen (­, U+2010).  \- turns into the ASCII
hyphen-minus that we use for options, programming, and so forth (U+002D).

I think my position at this point as pod2man maintainer (not yet
implemented in podlators) is that every occurrence of - in POD source will
be translated into \-, rather than using the current heuristics, and
people who meant to use ‐ should type it directly in the POD source.
pod2man now supports Unicode fairly well and will pass that along to
*roff, which presumably will do the right thing with it after character
set translation.

Currently, pod2man uses an extensive set of heuristics, but I think this
is a lost cause.  I cannot think of any heuristic that will understand
that the - in apt-get should be U+002D (so that one can search for the
command as it is typed), but the - in apt-like should be apt­like, since
this is an English hyphenated expression talking about programs that are
similar to apt.  This is simply not information that POD has available to
it unless the user writing the document uses Unicode hyphens.

I believe the primary formatting degredation will be for very long
hyphenated phrases like super-long-adjectival-phrase-intended-as-a-joke,
because *roff will now not break on those hyphens that have been turned
into \-.  People will have to rewrite them using proper Unicode hyphens to
get proper formatting.

-- 
Russ Allbery (r...@debian.org)  



Re: Hyphens in man pages

2023-10-15 Thread Gioele Barabucci

On 15/10/23 17:33, Iustin Pop wrote:

At least you're not lazy. I am, so what I did many times is add a
build-depends on pandoc, and write the man page in rst or md. I think
that's a worse solution (pandoc is really heavy), but at least, I don't
have to go back to *roff.


Another option for the members of the lazy club is `podlators-perl`.

The `.pod` syntax is OK, and it is not as heavy a dependency as pandoc.

Regards,

--
Gioele Barabucci



Re: Hyphens in man pages

2023-10-15 Thread Andreas Metzler
On 2023-10-15 Wookey  wrote:
[...]
> OK. So I read all that, and learned a whole load of stuff I was quite
> happy not knowing about.

> However despite reading it all, and especially this bit:
> "Whenever I've maintained man pages in roff I tend to be precise in
> > the usage of - and \-, but TBH this has seemed like a lost battle,"

> I was left not actually know what - and \- represent, nor which one I
> _should_ be using in my man pages. And that seems to be the one thing
> we should be telling the 'average maintainer'.
[...]

Hello,

a pretty good guidance[1] is to

use "\-" whenever it refers to option ("-h", --help"), argument ("find
-mmin -2") or something else that is not natural language but something
that might be pasted, like a command-name ("ssh-add" or "dpkg-source")

and "-" everywhere else.

cu Andreas

[1] Well it is "guidance": pasting will work, but there might still be
places in the prose where a dash would be typographically correct. Some
of these typographical conventions are langauage specific. All this
familiar to LaTeX users.
-- 
`What a good friend you are to him, Dr. Maturin. His other friends are
so grateful to you.'
`I sew his ears on from time to time, sure'



Re: Hyphens in man pages

2023-10-15 Thread Iustin Pop
On 2023-10-15 16:08:32, Wookey wrote:
> I think you can consider me representative of the typical maintainer
> who's intereaction with *roff languages almost entirely takes the
> form: 'Oh bloody hell I really ought to write a man page for this
> because upstream is too youthful to have done so - now how the hell
> does roff/nroff/groff work again' (no I'm not sure which it is I'm
> actually using, nor how any of this machinery really works, nor where
> to look for good practice, so I mostly copy existing stuff and DDG for
> answers, which is less than ideal when it comes to details like this).

At least you're not lazy. I am, so what I did many times is add a
build-depends on pandoc, and write the man page in rst or md. I think
that's a worse solution (pandoc is really heavy), but at least, I don't
have to go back to *roff.

regards,
iustin



Re: Hyphens in man pages

2023-10-15 Thread Wookey
On 2023-10-15 01:30 -0500, G. Branden Robinson wrote:
> At 2023-10-14T20:51:27-0600, Antonio Russo wrote:
> 
> Quick background: in the context of Unix usage as documented by
> nroff/troff, the dash used at the shell prompt, in text editors, and in
> programming language source code is a "minus sign".  troff has an em
> dash special character as well since the mid-1970s; groff adds an en
> dash as well, and furthermore supports user definition of characters
> providing access to any other sort of dash that comes down the Unicode
> pike.  (Not that doing so is a good idea in a man page; see below
> regarding a "restricted dialect" of man(7).)
> 
> > Now, depending on your email client and settings, the above will
> > appear to be the ravings of an unhinged lunatic who wrote the same
> > thing twice, or an unhinged lunatic who slammed their fists onto the
> > keyboard.
> 
> This issue does indeed have a history of provoking unhinged lunacy.
> 
> Before we proceed, you might wish to be aware of
>  and its
> proposed remedy.

OK. So I read all that, and learned a whole load of stuff I was quite
happy not knowing about.

However despite reading it all, and especially this bit:
"Whenever I've maintained man pages in roff I tend to be precise in
> the usage of - and \-, but TBH this has seemed like a lost battle,"

I was left not actually know what - and \- represent, nor which one I
_should_ be using in my man pages. And that seems to be the one thing
we should be telling the 'average maintainer'.

I think you can consider me representative of the typical maintainer
who's intereaction with *roff languages almost entirely takes the
form: 'Oh bloody hell I really ought to write a man page for this
because upstream is too youthful to have done so - now how the hell
does roff/nroff/groff work again' (no I'm not sure which it is I'm
actually using, nor how any of this machinery really works, nor where
to look for good practice, so I mostly copy existing stuff and DDG for
answers, which is less than ideal when it comes to details like this).

So this message is mostly a reminder that most people have not been
following along at all, so just referring people to bugs like this,
which discuss the issue in some detail, is not sufficient for
maintainers to stop doign unhelpful things.

(Yes I realise I could look it up, but I get the impression that
everyone involved in this discusssion assumes people know what '-' and
'\-' are so if they are just told to 'use the right one' will do so,
and I thought it worth pointing out that that's not correct). Info for
your average maintainer needs to go one step back and say "use stringA
in this circumstance and stringB in this circumstance. . The reason why it matters is: stuff about hyphen
and minus being different and minus being used in commands and
cut+pasting being important"

Hope that's helpful.

Wookey
-- 
Principal hats:  Debian, Wookware, ARM
http://wookware.org/


signature.asc
Description: PGP signature


Re: Hyphens in man pages

2023-10-15 Thread G. Branden Robinson
At 2023-10-14T20:51:27-0600, Antonio Russo wrote:
> I discovered a new pet peeve today: if you search for a command in a
> manual page, say -e in man 1 zgrep, it's a crapshot whether just
> searching for '-e' will find the command or not.  The reason is that
> "-" may been accidentally encoded as ‐ instead of -.

You can blame me for this.

https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n206

...me, and man page authors who don't think about whether they intend
a hyphen or a minus sign when they strike the '-' key...

Quick background: in the context of Unix usage as documented by
nroff/troff, the dash used at the shell prompt, in text editors, and in
programming language source code is a "minus sign".  troff has an em
dash special character as well since the mid-1970s; groff adds an en
dash as well, and furthermore supports user definition of characters
providing access to any other sort of dash that comes down the Unicode
pike.  (Not that doing so is a good idea in a man page; see below
regarding a "restricted dialect" of man(7).)

> Now, depending on your email client and settings, the above will
> appear to be the ravings of an unhinged lunatic who wrote the same
> thing twice, or an unhinged lunatic who slammed their fists onto the
> keyboard.

This issue does indeed have a history of provoking unhinged lunacy.

Before we proceed, you might wish to be aware of
 and its
proposed remedy.

> The reason is that man(1) convert bare dashes (0x2D) to hyphens
> (U+2010).  These are not the same symbol: searching for one does not
> find the other without some kind of normalization, pasting commands
> with one vs. the other does different things.  New users who do not
> understand this will be discouraged trying to read manual pages.
> Chances are, they will fill forums with mundane questions that could
> and should have been addressed by a simple search of a manual page.

I run into this problem, too, since I dogfood my own changes.  When
irritated by this, I try the search again, replacing '-' with '.', which
has yet to fail me (and produces false positives surprisingly rarely).

For example, I've recently been playing with the mg(1) editor, and
observed extremely poor discipline in this area.  So I forked it on
GitHub and have been preparing a bunch of revisions.  I wrote a sed
script to fix its numerous hyphen/dash problems.[1]

> I recently fixed a ton of these in another upstream package with this
> vim "one-liner":
> 
> :%s/--\([a-z]\+\)\(-[a-z]\+\)*/\=substitute(submatch(0), '-', '\\-', 'g')/g

My Vimscript is not very sophisticated, but it looks like you're
replacing only hyphens that appear in long option names here.  That's
good, as you're unlikely to clobber any hyphens that should _not_ become
minus signs.

Such discernment is important.  Many people who want to "solve" this
issue forget (or ignore) that not every '-' is a minus sign.  Some are
actual hyphens, as in "long-term effects" and "word-aligned struct
members".  Trying to infer a distinction from white space adjacency also
won't work.  Consider the phrases "word- or byte-sized caching" and
"object-based vs. -oriented programming".  While sophistication with
compound hyphenated affixes is seldom seen in man pages, we most often
find it where a man page author has taken considerable care with their
technical writing.  Such pages are less likely than most to require
revision with blunt instruments like regular expression-based global
search and replace operations.

> However, this requires manual review

Surprisingly often, the composition of high-quality technical
documentation requires the engagement of a human brain.

> and does not fix the '-e' example from zgrep.

Mapping all hyphens and minus signs to a single character, as people
whose blood pressure spikes over this issue tend to promote as a first
resort, is an ineluctably information-discarding operation.  In my
opinion, man page source documents are not the correct place to discard
that information.

(I acknowledge that you didn't propose such a crude remedy; I write to
anticipate the inevitable follow-ups from people who will.)

Doing so at rendering time is much more defensible, and happens anyway
on devices that do not distinguish these characters in the first place.

> There are also a whole host of this kind of problem, e.g., dashes in
> URLs that get naievely pasted into man pages (another live example I
> just addressed).

Yes, people commonly type URLs and email addresses into man page sources
as they would into an MUA or browser navigation bar.  Since U+2010 is
difficult to encode in such things, the man(7) package could help by
performing an automatic character translation in this area.  However,
(1) no one's actually asked for this and (2) it would address only a
tiny part of the problem.  The means of "help" I have in mind is
employment of the groff man(7) extension macros `UR`/`UE` and 

Re: Hyphens in man pages

2023-10-15 Thread Jochen Sprickerhof

Hi Antonio,

this is discussed in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052675

Cheers Jochen


signature.asc
Description: PGP signature


Hyphens in man pages

2023-10-14 Thread Antonio Russo
Hello,

I discovered a new pet peeve today: if you search for a command in a manual 
page,
say -e in man 1 zgrep, it's a crapshot whether just searching for '-e' will find
the command or not.  The reason is that "-" may been accidentally encoded as ‐
instead of -.

Now, depending on your email client and settings, the above will appear to be 
the
ravings of an unhinged lunatic who wrote the same thing twice, or an unhinged
lunatic who slammed their fists onto the keyboard.

The reason is that man(1) convert bare dashes (0x2D) to hyphens (U+2010).  These
are not the same symbol: searching for one does not find the other without some
kind of normalization, pasting commands with one vs. the other does different
things.  New users who do not understand this will be discouraged trying to read
manual pages.  Chances are, they will fill forums with mundane questions that
could and should have been addressed by a simple search of a manual page.

I recently fixed a ton of these in another upstream package with this vim 
"one-liner":

:%s/--\([a-z]\+\)\(-[a-z]\+\)*/\=substitute(submatch(0), '-', '\\-', 'g')/g

However, this requires manual review and does not fix the '-e' example from 
zgrep.
There are also a whole host of this kind of problem, e.g., dashes in URLs that 
get
naievely pasted into man pages (another live example I just addressed).

I come here with several questions:

 - Am I off-base thinking this is a problem?
 - Should we really be using troff to typeset anything in this year 2023?
   (In particular, if we can make the source text more human-readable, we might
   be able to leverage LLMs on this wealth of information in the future and 
automate
   support.  Are LLMs "fluent" in troff? I have not investigated at all.)
 - Are there any alternatives that actually produce nice looking man pages?
   (My experience with pandoc is that the source is still awkward, I literally
   just found another example of this bug in my own man page, and it looks 
pretty
   ugly in man. But maybe I just didn't find good examples/documentation.)
 - Should we try to come up with some lintian rules to flag this behavior?
   (This one: /--\([a-z]\+\)\(-[a-z]\+\)*/ finds long GNU-style commands, I'd
   have to think for at least a little bit about finding short ones.  This would
   ultimately be fragile. For example, the above doesn't find partially broken
   tokens; i.e., only one unescaped dash.)
 Automated tooling around this, more generally, seems fragile.  HTML might 
have
   been a nice compromise, but writing that appears to be out of vogue these 
days,
   despite being a pretty OK thing to read and write
   by hand. But seriously, I would love to be writing HTML 
instead
   of troff for manual pages.

Antonio

OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature