Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode.

Ingo Schwarze Sat, 04 Apr 2020 07:12:09 -0700

Hi Branden,

G. Branden Robinson wrote on Sat, Apr 04, 2020 at 05:29:38AM +1100:


> Second attempt.

I have no very strong opinion about this, so i don't strictly object
to changing it, if people here really consider it valuable to have.
Then again, I'm still not quite sure what exactly the point is.

 * We can discourage using \s12 and \s48 in the documentation
   without changing the language syntax and semantics, if we want
   to (that would probably make sense, as far as it is not already
   done).

 * We can throw a warning about \s36 (Consider using \s[36] because
   that's more readable.) and also about \s40 (Do you really want
   to print the digit zero in point size 4?  If so, consider saying
   \s[4]0 for clarity.) without changing any behaviour.

 * Either way, i would consider \s12 a legacy idiom, supported for
   backward compatibility only.  What is the point of making a
   feature more intuitive that is only supported for backward
   compatibility in the first place?

 * While the new interpretation of \s12 is formally rigorous, it
   makes extremely little sense for practical use.  Why would anybody
   ever want to print anything in point size 1?  Or even in point
   size 2 or 3?  That feels really useless.  Syntactic purity and
   simplicity of a language is indeed a goal, but not for its own
   sake, only in so far as it makes learning the language easier,
   and that's not the case here because no new users need to learn
   about \s12 anyway.

 * While i don't really see much benefit in making a legacy feature
   nicer when there already are better modern features providing
   the same functionality, there definitely is a cost: some
   documents will become harder to process in a correct way,
   and some people who value their finger memory more than code
   clarity will be annoyed for little gain.  It's definitely
   setting a trap for some experienced users.

 * "Compatibility mode", in general, is much less helpful than it
   usually seems.  Every time you think about it, you tend to only
   think about that one feature at hand and all seems fine in that
   mindset: if you want to old behaviour, switch the mode on, else
   leave it off.
   But software evolves gradually over the decades.  There isn't a
   single point in time such that before is legacy and afterwards
   is modern.  So the more you add to compatibility mode, the more
   often people will have the problem that they need to process
   documents that *require* compatibility mode in some respects,
   but only work *without* compatibility mode in other respects,
   because the documents were written at a time when some mildly
   modern features had already been introduced and could already
   be used freely (outside compatibility mode, that is), while at
   that same time, some antique feature were still fully supported
   that now begin to require compatibilty mode.  The more time goes
   by, the more severly broken the whole fundamentally flawed concept
   of compatibility mode will eventually become.  In that sense,
   the concept of compatibilty mode is a time bomb, or more precisely
   a smouldering fire that will slowly, but eventually develop into
   a considerable blaze.  The less it is touched, the less risk is
   added.


[...]
> From 682ba81d314497014f72f26a8c73ff4505a6eee9 Mon Sep 17 00:00:00 2001
> From: "G. Branden Robinson" <[email protected]>
> Date: Sat, 4 Apr 2020 05:14:09 +1100
> Subject: [PATCH] Support 2-digit \sNN only in compatibility mode.
> 
> * src/roff/troff/input.cpp (read_size): Move special-case
>   interpretation of single-digit point-size escapes with two digits to

The phrase "single-digit point-size escapes with two digits" sounds
confusing and oxymoronic.  Maybe "point size escapes of the form
\sNN", if this goes in?

>   compatibility mode (groff -C) only, and throw error diagnostic with
>   suggestion for remedy if encountered.
> 
>   The problem is that traditionally '\s36A' is interpreted as "set point
>   size to 36, then emit 'A'".  However, only values in the range 10-39
>   are handled specially; '\s40A' is interpreted as a four-point "0A".
>   This is unlike anything else in *roff grammar; see \*, \$, \f, \F, \g,
>   \k, \m, \M, \n, \V, and \Y.
> 
>   To anticipate objections: Why not throw only a warning?  Because there
>   isn't a warning category for supported but ambiguous syntax

Indeed, the groff warning categories are severely ill-designed.
Several of them are overly specific.  There is really no point in
switching something like "el" or "right-brace" on or off individually.
At the same time, even though the number of categories is way too
large, there are no categories for what really matters.  The most
important categories would be somewhat like these:

 1. critical syntax errors: the author's intent is unclear and the
    document will likely be considerably misformatted, or text from
    the source file is likely to be lost, not appearing in the output
    at all (could subsume char, delim, di, el, escape, input; and
    probably parts of missing, number, range, right-brace, scale)

 2. resource errors: the syntax is OK, but the document is likely
    to be severely mangled anyway, possibly also with information
    loss, because external resources are missing (file and font are
    typical examples)

 3. minor syntax errors: the author's intent is unclear but the
    consequences will likely be limited to minor ugliness of
    formatting; document content is unlikely to be lost (examples:
    missing - e.g. for requests of minor importance, like .af or .hcode
    number - e.g. when it occurs in a point size or similar context
    scale - maybe in some situations  ... etc.

 4. layout warnings: there is no syntax error and the author's intent is
    clear, but it just doesn't work well (break is a typical example;
    others would be filling problems like unusually wide inter-word
    distances in one line of a narrow column, or one isolated line of
    a paragraph in the first or last line of a page or column, or a
    single short word wrapping over to a new line).

 5. style warnings: intent is clear and formatting will be just fine,
    but maybe the author should have a look anyway because there
    may be a better way to do the same, or the situation may or may not
    indicate some oversight (ig, mac, reg, space could be subsumed here,
    and maybe syntax or parts of it; this is where a warning about
    \s48 in compat mode or about \s12 in non-compat mode would belong)

Of course it need not be exactly like this.  There could be fewer
categories (e.g. one could combine 1+2 and/or 3+4), or some might
want to split a category (e.g. 1 could be split into character-related,
programming-related, layout-related), but this is how warning categories
ought to be designed, in principle.

I have no idea how to deal with the current mess.  Just add it to
whatever category comes closest, even if it doesn't really fit?

In any case, *please* do not use shortcomings of the GNU troff
specific message system as an argument for or against any syntactic
or semantic choices regarding the roff(7) language at large.

>   (this
>   behavior of AT&T troff dates to 1976 but apparently was not documented
>   until 1992).  Why not throw the warning outside of compatibility mode
>   too?  Because outside of compatibility mode we (now) have an
>   unambiguous parse.

Right, but the meaning of the what was parsed makes absolutely no
sense, so it would be clearly useful to warn about \s12 in the
new world:  what it does is almost certainly not what the author
intended, and with such a small point size, the misformatting
will even be quite severe, likely producing unreadable text.


[...]
> diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
> index 6f35dadf..d21bd698 100644
> --- a/src/roff/troff/input.cpp
> +++ b/src/roff/troff/input.cpp
> @@ -5076,13 +5076,30 @@ static int read_size(int *x)
>    }
>    else if (csdigit(c)) {
>      val = c - '0';
> -    if (!inc && c != '0' && c < '4') {
> +    if (compatible_flag && !inc && c != '0' && c < '4') {
> +      // Note: Very special case!  If we have \s followed immediately by
> +      // a digit (not '(', '+', or '-'), and that digit is 1, 2, or
> +      // 3...read another digit!  This is because the Graphic Systems
> +      // C/A/T phototypesetter (the original device target for AT&T
> +      // troff) only supported a few discrete point sizes in the range
> +      // 6..36.  Kernighan warned of this in the 1992 revision of CSTR
> +      // #54 (section 2.3), and more recently, McIlroy referred to it as
> +      // a "living fossil".  This DWIM syntax is surprising to the user;
> +      // it clashes with the syntax of several other escapes (\*, \$,
> +      // \f, \F, \g, \k, \m, \M, \n, \V, and \Y).  We therefore support
> +      // it only in compatibility mode.
> +      //
> +      // See:
> +      //  https://lists.gnu.org/archive/html/groff/2020-03/msg00054.html
> +      // et seq.

This comment feels quite excessive to me.  Very long comments may
occasionally make sense above functions that are very long and very
complicated, but you clearly shouldn't comment a five-line if block
in the middle of a relatively simple, straight-forward function
with a 15-line comment, in particular if the block is neither
all that important nor hard to understand.  Such a long comment
makes the actual code hard to find, and makes the file hard to read
because you have to scroll so much.

"Note: Very special case!" says nothing at all.

"If we have .. read another digit" and "We therefore support it
only in compatibility mode" is totally obvious in the first place:

        i++;  /* increment the variable i by one */

"This is because ... living fossil" may be appropriate in a commit
message, but not as a comment, and in the commit message, a reference
to the mailing list discussion may nearly suffice.

"This DWIM syntax is surprising" doesn't belong in the code at all;
we might want to warn users in the manual page, though.

I don't think any comment is really needed at all because the
code is very obvious and straightforward.  If you absolutely
want to say something,

        // Support the legacy form \s10 to \s39.

or something like that should be sufficient.

Yours,
  Ingo

Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode.

Reply via email to