Follow-up Comment #6, bug #68326 (group groff):

[comment #5 comment #5:]
> You seem to have not yet seen the rest of the thread, in which I attach the
> configuration.nix.5 file.

Right.  I hadn't.  But I see it now.


$ time grog configuration.nix.5 
groff -man configuration.nix.5

real    0m21.393s
user    0m21.367s
sys     0m0.004s


That's impressively bad, too.  Still...


$ ls -hl configuration.nix.5 
-rw-r--r-- 1 branden branden 9.8M May  8 11:14 configuration.nix.5


That is unprecedentedly large for a man page.

> Accordingly, I'm going to limit my response to only things which which are
> not better communicated by inspecting the file itself.

It looks highly uniform in style, and also not highly idiomatic for a _man_(7)
document.[1]  It doesn't identify itself as autogenerated by a tool, but that
seems likely.

The date stamp of 1 January 1980 is not plausible.  Many systems of that era
didn't even have storage devices capable of housing 9.8 megabytes of data.
(Ever seen the movie _WarGames_?  Think of David Lightman's IMSAI 8080 with
the two eight-inch floppy drives.)
 
>> What is this "very long document"?  Is it a practical one?
> 
> configuration.nix.5 is the documentation for NixOS, a popular Linux
> distribution and a nascent FreeBSD distribution.

I've heard of it.  I gather that GNU's Guix is a similar effort.
 
> It coalesces a huge number of configuration options from a large number of
> sources.

Evidently!

> It is one of my most frequently accessed man pages and, barring the use of a
> different type of document reader entirely, essential for configuring one's
> NixOS system, since the alternative is having a checkout of the
> distribution's package repository and performing lots of very fragile text
> searches.

Okay.  I have some practical advice for you, then.


$ time nroff -Ww -man configuration.nix.5 | head > /dev/null

real    0m4.337s
user    0m5.598s
sys     0m0.161s


I agree that this sort of delay is hard to bear.

Since the problem is "continuous rendering mode", whereby the entire 9.8MB
document is rendered to, let's see...one 380,050-line page, why not turn
continuous rendering off?

groff_man(7):


     -rcR=1   Enable continuous rendering.  Output is not paginated;
              instead, one (potentially very long) page is produced.
              This is the default for terminal and HTML devices.  Use
              -rcR=0 to disable it on terminals; on HTML devices, it
              cannot be disabled.


So let's try that.


$ time nroff -Ww -r cR=0 -man configuration.nix.5 | head > /dev/null 
real    0m0.028s
user    0m0.028s
sys     0m0.007s


Since such a gigantic document (7,268 pages) is humanly impossible to peruse
on a frequent basis, I infer that the frustration with slow initial rendering
is that you want to get the document up in the pager quickly so that you can
key in a search immediately.  The foregoing should help with that.

>> Yes, that's because the entire page has to be addressable.
> 
> Indeed, if it is within the aim of the groff project to be able to render
> arbitrarily long documents where any byte of input can affect any byte of
> output, then this patch is no good.

The ability of "bytes" to affect other "bytes" is not a first principle of
_groff_.  Here's something closer to fundamental material.


A page is a two-dimensional structure upon which a roff system imposes a
rectangular coordinate system with its origin near the upper left corner.
Coordinate values are in basic units and increase down and to the right.
Useful ones are typically positive and within numeric ranges corresponding to
the page boundaries.


https://www.gnu.org/software/groff/manual/groff.html.node/Page-Geometry.html

So, for instance, GNU _troff_, like the AT&T _troff_ program it reimplements,
wants to support esoteric and bizarre typesetting operations like, say,
drawing a rule (line) from the bottom-right to the top-left corner of the page
(and maybe the complementary diagonal, making a big "X" on the page).

> I was aware of the fact that this was at least in theory a capability of the
> troff/grotty pipeline while I was developing this patch, and as such provided
> the accommodation that it keeps a buffer of at least 10k lines (16k in
> practice) and only flushes 1k lines at a time, allowing any byte of input to
> write to any byte of output so long as there have been at most 9k lines (15k
> in practice) declared to exist following the targeted line. I assumed this
> was a reasonable limitation. If not, we will obviously have to figure out
> something else.

Give `-r cR=0` a try and let me know how it works out for you.
 
> The easiest compromise would be to gate this new behavior behind a command
> line flag or environment variable.

That would still break any rendering scenario where rules are drawn upward far
enough in the page.  I'm not enthusiastic about adding command-line options to
enable incorrect rendering.

> A potentially less hacky solution would be to add something to the man macros
> which allows ending a page even in continuous rendering mode.

Or, one could, you know, paginate.

> I actually looked into this solution before reaching to patch grotty, but the
> groff macro language is a bit impenetrable to me. If this is wanted, I'll try
> again.

_grotty_ is an output driver; it's not the right place to change the "paper
format".  Or, more precisely, it's not sufficient to change it there.  The
formatter (GNU _troff_) has to know what the paper format is too.
 
>> If you're using man-db man(1), set `MANROFFOPT=-Z`.  That takes grotty out
>> of the pipeline.
> 
> You seem to have responded to this below.
> 
>> I suggest you construct a man page such that most of the body text is a
>> tbl(1) table using the "allbox" region option.
> 
> More or less responded above. If hundreds of thousands of lines of vertical
> rule is a limitation we are working with, then so be it.

It is.  I seem to recall that Deri James, a fellow _groff_ developer and our
PDF expert, has a demonstration of using groff to render a document featuring
colorful tiling of most of the page area; I'm not sure any text was even on
it.

>> Also, what is the application, for the practical human man page reader, of
>> closing the pipe after the first ten lines and sending the output to
>> /dev/null?
> 
> Of course this is not a meaningful command to a user. Please trust that I
> have done the meaningful tests and have instead provided the simplest command
> which has only inputs I can provide and can actually be measured easily. I
> have of course tested each part of the pipeline in isolation, including with
> a real pager, and verified that the numbers presented here reflect an actual
> user's experience. In this case, the use of head is a rough proxy for
> time-to-first-byte, sending SIGPIPE to grotty soon after it produces its
> first byte. This allows the time command to provide a meaningful number,
> since it will wait for its pipeline to terminate entirely before concluding
> its measurement. The pipe to /dev/null also serves no purpose but to clean up
> the output, but I verified that it does not affect the timing result.

I've never heard of you before; trust requires a foundation.
 
> I apologize for not clarifying these points in my original post. I seem to
> have misjudged the standard by which contributions are assessed here.

A review of a couple dozen messages to the _groff_ mailing list, or the
comment logs on a similar number of Savannah tickets filed against _groff_,
will likely give you a gestalt of the tenor of the community.

Recall from comment #0 that you sited your proposed patch squarely within the
context of a ticket raised by Arch Linux users who complained about
_grotty_(1)'s performance with artificially constructed, absurdly lengthy
inputs.  That was a worthwhile bug report because it brought to light a, shall
we say, "pessimal" memory allocation strategy--a problem worth resolving
regardless of the nature of the input.  That didn't make formatting 50 copies
of the GCC or Bash man pages any more practical a scenario, though.  Those
were merely easy ways to pile up a large volume of idiomatic input to
stress-test the rendering pipeline.

I didn't know, then, that there was anyone on Earth who actually had a 9.8
megabyte man page of regularly structured but non-repeating material.  Now I
do.  This is a thing that NixOS does.  Okay.  I'm willing to work with its
users to make their experience more pleasant, if I can do so without
compromising what I think of as _groff_'s good qualities.  Hence the advice in
this response.

> I offer a reassurance - I have in fact spent some hours studying this
> codebase and its respective communications archives in the hopes of being
> able to provide a wanted, non-extractive contribution to you and your users.

Okay.  What do you mean by "non-extractive"?

I have some comments on this generated document.  Perhaps you can direct me to
the tool that produced it.


.TH "CONFIGURATION\&.NIX" "5" "01/01/1980" "NixOS" "NixOS Reference Pages"


Pointless, ineffectual use of the dummy character escape sequence.

In fact this nilpotent idiom is used with great frequency throughout the
document.  I guess whoever wrote the tool that generates these
configuration.nix files had/has an unsteady command of *roff.  I encourage
them to read the documentation or ask the _groff_ mailing list for assistance
in achieving their goals.  Here's one place the tool maintainer might start.

https://www.gnu.org/software/groff/manual/groff.html.node/Sentences.html

Incidentally, you might like to combine paginated rendering with additional
options to null out the page headers and footers, since anybody reading this
monster man page will know/recognize it.


$ nroff -Ww -d PT= -d BT= -r cR=0 -man configuration.nix.5 | less -R


_groff_man_(7):


   Hooks
     Two macros, both GNU extensions, are called internally by the groff
     man package to format page headers and footers and can be redefined
     by the administrator in a site’s man.local file (see section
     “Files” below).  The presentation of TH above describes the default
     headers and footers.  Because these macros are hooks for groff man
     internals, man pages have no reason to call them.  Such hook
     definitions typically consist of “sp” and “tl” requests.  PT
     furthermore has the responsibility of emitting a PDF bookmark after
     writing the first page header in a document.  Consult the existing
     implementations in an.tmac when drafting replacements.

     .BT    Set the page footer text (“bottom trap”).

     .PT    Set the page header text (“page trap”).

     To remove a page header or footer entirely, define the appropriate
     macro as empty rather than deleting it.


5 lines out of every 66 will still be blank.  Those are the page margins.
Maybe that is something we can parameterize.


.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" enable line breaks after slashes


Not the right way to do that.  _groff_ 1.24 defeats those measures at every
new paragraph.  Observe.


NAME
     configuration.nix - NixOS system configuration specification

DESCRIPTION
     The  file  /etc/nixos/configuration.nix contains the declarative
specifica‐
     tion of your NixOS system configuration. The  command  nixos-rebuild
takes
     this file and realises the system configuration specified therein.

OPTIONS
     You can use the following options in configuration.nix.

     <imports = [ pkgs.ghostunnel.services.default ]>
         This  is  a modular service[1], which can be imported into a NixOS
con‐
         figuration using the ‘system.services’[2] option.

         Type: submodule

          1. https://nixos.org/manual/nixos/unstable/#modular-services
          2.
https://search.nixos.org/options?channel=unstable&show=system.ser‐
             vices&query=modular+service


See the adjustment and hyphenation happening?

Configure defaults using the `AD` string and `HY` register in...well, wherever
NixOS installs "man.local".  I've heard that NixOS doesn't use anything
resembling the FHS to organize the file system.

groff_man(7):


Options
     The following groff options set registers (with -r) and strings
     (with -d) recognized and used by the man macro package.  To ensure
     rendering consistent with output device capabilities and reader
     preferences, man pages should never manipulate them.

     -dAD=adjustment‐mode
              Set line adjustment to adjustment‐mode, which is typically
              “b” for adjustment to both margins (the default), or
“l”
              for left alignment (ragged right margin).  Any valid
              argument to groff’s “ad” request may be used; see
              groff(7).
...
     -rHY=0   Disable automatic hyphenation.  Normally, it is
              enabled (1).  The hyphenation mode is determined by the
              groff locale; see section “Localization“ of groff(7).



.\" enable line breaks after slashes
.cflags 4 /


Not portable to traditional _troffs_, and I'm not sure about _mandoc_(1), but
okay.  It's less work than sticking hyphenless break point escape sequences
into file names.  This is a resourceful use of the language.

Anyway, I'll stop on that positive note, because I can see it might be the
last such occasion I'll have for a while, and I'm guaranteed to run out of
steam reviewing this document long before it finishes.  :)


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?68326>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to