Hi Branden, G. Branden Robinson wrote on Sat, Apr 04, 2020 at 05:29:38AM +1100:
> Second attempt. I have no very strong opinion about this, so i don't strictly object to changing it, if people here really consider it valuable to have. Then again, I'm still not quite sure what exactly the point is. * We can discourage using \s12 and \s48 in the documentation without changing the language syntax and semantics, if we want to (that would probably make sense, as far as it is not already done). * We can throw a warning about \s36 (Consider using \s[36] because that's more readable.) and also about \s40 (Do you really want to print the digit zero in point size 4? If so, consider saying \s[4]0 for clarity.) without changing any behaviour. * Either way, i would consider \s12 a legacy idiom, supported for backward compatibility only. What is the point of making a feature more intuitive that is only supported for backward compatibility in the first place? * While the new interpretation of \s12 is formally rigorous, it makes extremely little sense for practical use. Why would anybody ever want to print anything in point size 1? Or even in point size 2 or 3? That feels really useless. Syntactic purity and simplicity of a language is indeed a goal, but not for its own sake, only in so far as it makes learning the language easier, and that's not the case here because no new users need to learn about \s12 anyway. * While i don't really see much benefit in making a legacy feature nicer when there already are better modern features providing the same functionality, there definitely is a cost: some documents will become harder to process in a correct way, and some people who value their finger memory more than code clarity will be annoyed for little gain. It's definitely setting a trap for some experienced users. * "Compatibility mode", in general, is much less helpful than it usually seems. Every time you think about it, you tend to only think about that one feature at hand and all seems fine in that mindset: if you want to old behaviour, switch the mode on, else leave it off. But software evolves gradually over the decades. There isn't a single point in time such that before is legacy and afterwards is modern. So the more you add to compatibility mode, the more often people will have the problem that they need to process documents that *require* compatibility mode in some respects, but only work *without* compatibility mode in other respects, because the documents were written at a time when some mildly modern features had already been introduced and could already be used freely (outside compatibility mode, that is), while at that same time, some antique feature were still fully supported that now begin to require compatibilty mode. The more time goes by, the more severly broken the whole fundamentally flawed concept of compatibility mode will eventually become. In that sense, the concept of compatibilty mode is a time bomb, or more precisely a smouldering fire that will slowly, but eventually develop into a considerable blaze. The less it is touched, the less risk is added. [...] > From 682ba81d314497014f72f26a8c73ff4505a6eee9 Mon Sep 17 00:00:00 2001 > From: "G. Branden Robinson" <[email protected]> > Date: Sat, 4 Apr 2020 05:14:09 +1100 > Subject: [PATCH] Support 2-digit \sNN only in compatibility mode. > > * src/roff/troff/input.cpp (read_size): Move special-case > interpretation of single-digit point-size escapes with two digits to The phrase "single-digit point-size escapes with two digits" sounds confusing and oxymoronic. Maybe "point size escapes of the form \sNN", if this goes in? > compatibility mode (groff -C) only, and throw error diagnostic with > suggestion for remedy if encountered. > > The problem is that traditionally '\s36A' is interpreted as "set point > size to 36, then emit 'A'". However, only values in the range 10-39 > are handled specially; '\s40A' is interpreted as a four-point "0A". > This is unlike anything else in *roff grammar; see \*, \$, \f, \F, \g, > \k, \m, \M, \n, \V, and \Y. > > To anticipate objections: Why not throw only a warning? Because there > isn't a warning category for supported but ambiguous syntax Indeed, the groff warning categories are severely ill-designed. Several of them are overly specific. There is really no point in switching something like "el" or "right-brace" on or off individually. At the same time, even though the number of categories is way too large, there are no categories for what really matters. The most important categories would be somewhat like these: 1. critical syntax errors: the author's intent is unclear and the document will likely be considerably misformatted, or text from the source file is likely to be lost, not appearing in the output at all (could subsume char, delim, di, el, escape, input; and probably parts of missing, number, range, right-brace, scale) 2. resource errors: the syntax is OK, but the document is likely to be severely mangled anyway, possibly also with information loss, because external resources are missing (file and font are typical examples) 3. minor syntax errors: the author's intent is unclear but the consequences will likely be limited to minor ugliness of formatting; document content is unlikely to be lost (examples: missing - e.g. for requests of minor importance, like .af or .hcode number - e.g. when it occurs in a point size or similar context scale - maybe in some situations ... etc. 4. layout warnings: there is no syntax error and the author's intent is clear, but it just doesn't work well (break is a typical example; others would be filling problems like unusually wide inter-word distances in one line of a narrow column, or one isolated line of a paragraph in the first or last line of a page or column, or a single short word wrapping over to a new line). 5. style warnings: intent is clear and formatting will be just fine, but maybe the author should have a look anyway because there may be a better way to do the same, or the situation may or may not indicate some oversight (ig, mac, reg, space could be subsumed here, and maybe syntax or parts of it; this is where a warning about \s48 in compat mode or about \s12 in non-compat mode would belong) Of course it need not be exactly like this. There could be fewer categories (e.g. one could combine 1+2 and/or 3+4), or some might want to split a category (e.g. 1 could be split into character-related, programming-related, layout-related), but this is how warning categories ought to be designed, in principle. I have no idea how to deal with the current mess. Just add it to whatever category comes closest, even if it doesn't really fit? In any case, *please* do not use shortcomings of the GNU troff specific message system as an argument for or against any syntactic or semantic choices regarding the roff(7) language at large. > (this > behavior of AT&T troff dates to 1976 but apparently was not documented > until 1992). Why not throw the warning outside of compatibility mode > too? Because outside of compatibility mode we (now) have an > unambiguous parse. Right, but the meaning of the what was parsed makes absolutely no sense, so it would be clearly useful to warn about \s12 in the new world: what it does is almost certainly not what the author intended, and with such a small point size, the misformatting will even be quite severe, likely producing unreadable text. [...] > diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp > index 6f35dadf..d21bd698 100644 > --- a/src/roff/troff/input.cpp > +++ b/src/roff/troff/input.cpp > @@ -5076,13 +5076,30 @@ static int read_size(int *x) > } > else if (csdigit(c)) { > val = c - '0'; > - if (!inc && c != '0' && c < '4') { > + if (compatible_flag && !inc && c != '0' && c < '4') { > + // Note: Very special case! If we have \s followed immediately by > + // a digit (not '(', '+', or '-'), and that digit is 1, 2, or > + // 3...read another digit! This is because the Graphic Systems > + // C/A/T phototypesetter (the original device target for AT&T > + // troff) only supported a few discrete point sizes in the range > + // 6..36. Kernighan warned of this in the 1992 revision of CSTR > + // #54 (section 2.3), and more recently, McIlroy referred to it as > + // a "living fossil". This DWIM syntax is surprising to the user; > + // it clashes with the syntax of several other escapes (\*, \$, > + // \f, \F, \g, \k, \m, \M, \n, \V, and \Y). We therefore support > + // it only in compatibility mode. > + // > + // See: > + // https://lists.gnu.org/archive/html/groff/2020-03/msg00054.html > + // et seq. This comment feels quite excessive to me. Very long comments may occasionally make sense above functions that are very long and very complicated, but you clearly shouldn't comment a five-line if block in the middle of a relatively simple, straight-forward function with a 15-line comment, in particular if the block is neither all that important nor hard to understand. Such a long comment makes the actual code hard to find, and makes the file hard to read because you have to scroll so much. "Note: Very special case!" says nothing at all. "If we have .. read another digit" and "We therefore support it only in compatibility mode" is totally obvious in the first place: i++; /* increment the variable i by one */ "This is because ... living fossil" may be appropriate in a commit message, but not as a comment, and in the commit message, a reference to the mailing list discussion may nearly suffice. "This DWIM syntax is surprising" doesn't belong in the code at all; we might want to warn users in the manual page, though. I don't think any comment is really needed at all because the code is very obvious and straightforward. If you absolutely want to say something, // Support the legacy form \s10 to \s39. or something like that should be sufficient. Yours, Ingo
