[bug #64018] [man, mdoc] decide on a common base paragraph indentation

G. Branden Robinson Thu, 03 Aug 2023 16:13:11 -0700

Follow-up Comment #15, bug #64018 (project groff):

[comment #14 comment #14:]
> > Possibly, _mdoc_(7) page authors knew this and carefully edited the
> > ones that did, so that now no one sees them.
> 
> Exactly, we do that… I recently also begun rewording things to
> avoid hyphenation as well, although only with “french” spacing
> (no american double-space after a full stop) since that’s what
> I use in my BSD.



Dave Kemper and I (both Americans) have different views on how wide the space
after a sentence should be, but we both found the following resource
valuable.

https://web.archive.org/web/20171217060354/http://www.heracliteanriver.com/?p=324

_groff_ makes this parameter tunable for all output devices, so it's not a
source of strife, except among those who don't read documentation.

> >> We figured out that putting \& after punctuation only for stuff like
> >> “e.g.\&” (where you don’t want the american double-spacing after),
> >> and otherwise before (e.g. “\&.”
> 
> That would be “.Dq \&.”, for the sake of completeness.
> I also saw “.Dq .\&”, and, for some time, people were
> not clear about which one to use, but in discussion
> with J�rg, it got clear that the most portable is to
> put the \& in front always except “e.g.\&”.

For the case of ".Dq .", I agree.  The macro itself won't (shouldn't) put a
bare dot at the beginning of an (interpolated) input line, so the worry is
that interpolation will result in the quoted dot being treated as the end of a
sentence, just as:


He said, "the guys just left."


is.

> No, something like “.Dq \&Li”, that is, where I have a
> two-character argument to a parsed macro, so it isn’t
> interpreted as callable macro.
> 
> This application is disctinct from escaping a leading
> dot or apostrophe or an end-of-line dot that’s not a
> full stop.

Agreed.  This is an additional application of the dummy character created by
_mdoc_(7)'s unique design.
 
> > Interesting.  I did not know `In` was a late-breaking macro
> 
> I first saw it in manpages from NetBSD, and OpenBSD did
> not have it, nor use it. (I think that before the switch
> to mdocml they didn’t change their tmacs much.)

Interesting.  Perhaps the feeling was that it wasn't worth maintaining any
aspect of _troff_ as the desire built to kill it.

> >> We cannot, obviously, have three-letter requests.
> > Nope.  Like I said, there's room for `Cq`, `Co`, and `Cc`.
> 
> Indeed, I see only Co used grepping through all tmacs:
> tmac.doc.old has it as macro (just .tm’ing to say it’s
> not an mdoc macro) plus…

It's in _groff_'s "doc-old.tmac", too, which has the same origin.

Huh.  I wonder what the story behind that is.

> | mdoc/README:.\" NS Co register (site) Width Needed for Column offset
> 
> … I’m not sure if this is still true, given my grep
> did not find any other occurrence? I think this is
> old/wrong and needs to be removed.

It seems likely to me.

I would guess that Ingo has the world's biggest corpus of _mdoc_(7) documents
readily at hand--but perhaps not the time to grep them for our benefit.  :P

> >> The codebase is the “last” nroff I could use under the Caldera
> >> licence, i.e. that was shipped with a BSD covered by these. The
> 
> > Is there anyplace these can currently be obtained?
> 
> I got them from minnie.tuhs.org; if your CVS skills are still
> not too rusty, you can get the subset I imported from MirBSD
> anoncvs, too.

My CVS skills _are_ pretty rusty but this wouldn't help me.  Every time
someone has pointed me to something they said was "1980s troff sources", it
was a descendant of Ossanna troff, not Kernighan's rewrite.  In other words,
it was 1970s _troff_.

I think the first time someone sent me on that goose chase it was to Kirk
McKusick's CSRG CD-ROMs.  Much great stuff, but not a vintage 1981 Kernighan
troff.  I want that *so bad*.
 
> And yes, it’s not the later one, it’s the old one where troff
> was for that one typesetter(?) machine. I *do* also have a
> tape archive of a ditroff predating 1990 which would be in the
> PD in the USA but not in the rest of the world, so I cannot
> use it (and trying to figure out who even _could_ give a
> licence is probably not worth the effort… I think it was
> Lucent labs at some point, and someone told me they generally
> don’t even have an idea about this),

Look into that DWB 3.3 link I shared.  This sounds like a very similar thing.

> so I had to bite the sour
> apple and use the last one from the Caldera drop, which is
> pretty much 1970s C code. No prototypes, and every variable
> (other than some which are short or char) is of the data type
> int or char* which are identical and interchangeable, and they
> manually paged part of the -fcommon data area relying on the
> in-memory layout to match the one from the source…

K&R C was the best language ever because it was so weakly typed. :-|


> > I hope that's a labor of love
> 
> Oh, definitely!
> 
> (This also allowed me to get rid of C++ from the base system;
> groff was the last remaining part, and now I can just install
> one from ports on the box where I render the ps→pdf docs.)

I regret that the implementation language makes _groff_ so stinky for people. 
Having seen the problems it was solving, I can understand why Clark selected
it over the just-born ANSI C.  It's much closer to applications programming
than systems programming, and C++ had much promise there.  Of course
Stroustrup promoted as the best language for everything.  :-/

>From today's perspective, in _groff_ there are huge amounts of data-structure
walking code that could be replaced by C++98 "algorithms" (they _are_, but God
the name assaults the nose like a bottle of brogrammer patchouli oil), or
cleanly replaced by C++11 idioms.

I get the feeling that Clark ended up not doing as much input validation as he
might simply because it was so incredibly tedious to walk data structures.

Here's a recent example of some validation I added, with annotations of future
possibilities.


bool is_family_valid(const char *fam)
{
  // std::vector<const char *> styles{"R", "I", "B", "BI"}; // C++11
  const size_t nstyles = 4;
  const char *st[nstyles] = { "R", "I", "B", "BI" };
  std::vector<const char *> styles(st, (st + nstyles));
  // for (auto style : styles) // C++11
  std::vector<const char *>::iterator style;
  for (style = styles.begin(); style != styles.end(); style++)
    if (!check_font(fam, *style))
      return false;
  return true;
}


C++11 would cut this function about in half.  Of course, the whole thing is
pure bloat from an Annotated Reference Manual C++ perspective, where'd you
skip all this ridiculous validation entirely because what could go wrong?  The
ARPAnet is such a friendly place...
 
> As for tbl, IIUC the limitation is because of the limitation
> on string names in nroff.

Yeah, the two-character name space feels limited in short order.  But it made
Ken Thompson's fan club ecstatic.

> I wonder if I can relax the latter
> a little in my implementation, having already raised the amount
> of things it can handle, just enough to make that page work…
> 
> … gah, not tonight. No nerdsnipey for me.

Maybe I'll get you next time...


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64018>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

[bug #64018] [man, mdoc] decide on a common base paragraph indentation

Reply via email to