gbranden pushed a commit to branch master
in repository groff.
commit 3dc11a61aae17cddc336939e8b8374dbeb6edbc7
Author: G. Branden Robinson <[email protected]>
AuthorDate: Sat Apr 18 14:40:10 2026 -0500
src/roff/troff/env.cpp: Fix Savannah #68252.
* src/roff/troff/env.cpp (read_font_identifier): Rewrite to fix Savannah
#68252, an assertion failure I introduced in commit 8b83fd3dd8, 2
November, in
"src/roff/troff/input.cpp:token::is_usable_as_delimiter()".
Time to fire up the development mailing list critical homunculus and
have a Socratic dialogue with it.
"Rewrite?" Why hit such a small thing with such a big hammer? Why
not just fix my stupid mistake of adding an assertion? {A} I
couldn't--wouldn't--just rip the assertion out, because it's in a
hot-path function where much sanity checking takes place. {B} This
code implicates one of the more challenging parts of the *roff
language grammar: (almost) anywhere you can read a font identifier
like `B`, `TR`, or `MARIGOLD`, it's also valid to read a font mounting
position, a nonnegative integer. In *roff, nonnegative integers also
happen to be valid identifier names! You can have a register named
`55` or a string/macro/diversion named `10` {hello, eqn(1)}. {C} This
function is called only by uncommonly used request handlers, those for
`uf`, `fschar`, `rfschar`, `special`, `fspecial`, `fzoom`, `bd`, and
`tkf`. {It's a rare groff user who can say with confidence what _all_
of those requests do.} "But wait," you say, "isn't this same kind of
font lookup done wildly commonly, like with every `ft` request or `\f`
escape sequence? {D} Yes! So there's good news and bad news, and
they're the same news. There is logic to do exactly that--to
disambiguate and resolve font mounting positions and identifiers
behind that request and that escape sequence. It lives,
duplicatively, in another function in a different source file. {E}
"But wait," you say, "why was this function, a supporting function for
request handlers that read only space-delimited arguments, calling
`token::is_usable_as_delimiter()`?" Good question! I don't know, but
it's been doing that, modulus the renamings for which I fear I am
becoming notorious, at least since groff 1.22.3 in November 2014.
Within the request handlers, we aren't in a delimited input context at
all--as the GNU troff code base uses that term--so it's a mystery.
`token::is_usable_as_delimiter()` used to be known as
`token::delimiter()`. I invite the reader to compare and
contrast the
following.
https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/\
troff/node.cpp?h=1.22.3#n6163
https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/\
node.c?h=1.02#n4373
Why not just have `read_font_identifier()` call `select_font()` over
in "env.cpp"? Side effects. As with many GNU troff functions,
parsing logic is entangled with engine logic; see Savannah #68240. So
instead I copy the lexical analysis bits of `select_font()` into this
function, replacing most of it. How can I be confident this will work
well? Apart from a build proceeding successfully and passing our test
suite (now with 340 automated tests), the `select_font()` logic is a
hotter code path and, I suspect, always has been. After my last push
earlier this week, I gathered coverage statistics from a full build
with tests. `read_font_identifier() was hit 12,023 times. That seems
like a lot, right?
`select_font()` was hit 254,489 times. If you can't successfully
change fonts in GNU troff, a lot more people (and test scripts of
ours) will notice than can observe malfunctions of the `uf`, `fschar`,
`rfschar`, `special`, `fspecial`, `fzoom`, `bd`, or `tkf` requests.
(And, in fact, `bd` _doesn't_ work well. See Savannah #68256.) So if
I have to violate DRY, I'm cribbing the battle-hardened code.
Moreover there's no reason at all, on any principle, to be checking
for delimiters in this input context, as noted above.
Fixes <https://savannah.gnu.org/bugs/?68252>. The root cause of the
problem depends on what you consider the defect to be. As noted
above, the assertion came in with commit 8b83fd3dd8, 2 November,
affecting groff 1.24.0. But the code path _containing_ the assertion
should never have been taken in the first place by the request
handlers for `bd` et al.--yet it has been, ever since groff's birth.
Until now.
---
ChangeLog | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
src/roff/troff/env.cpp | 4 +++
src/roff/troff/node.cpp | 50 ++++++++++++++++++++++++------
3 files changed, 127 insertions(+), 9 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 5b894462d..61ea297a6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,85 @@
+2026-04-18 G. Branden Robinson <[email protected]>
+
+ * src/roff/troff/env.cpp (read_font_identifier): Rewrite to fix
+ Savannah #68252, an assertion failure I introduced in commit
+ 8b83fd3dd8, 2 November, in "src/roff/troff/input.cpp:token::
+ is_usable_as_delimiter()".
+
+ Time to fire up the development mailing list critical homunculus
+ and have a Socratic dialogue with it.
+
+ "Rewrite?" Why hit such a small thing with such a big hammer?
+ Why not just fix my stupid mistake of adding an assertion? {A}
+ I couldn't--wouldn't--just rip the assertion out, because it's
+ in a hot-path function where much sanity checking takes place.
+ {B} This code implicates one of the more challenging parts of
+ the *roff language grammar: (almost) anywhere you can read a
+ font identifier like `B`, `TR`, or `MARIGOLD`, it's also valid
+ to read a font mounting position, a nonnegative integer. In
+ *roff, nonnegative integers also happen to be valid identifier
+ names! You can have a register named `55` or a string/macro/
+ diversion named `10` {hello, eqn(1)}. {C} This function is
+ called only by uncommonly used request handlers, those for `uf`,
+ `fschar`, `rfschar`, `special`, `fspecial`, `fzoom`, `bd`, and
+ `tkf`. {It's a rare groff user who can say with confidence what
+ _all_ of those requests do.} "But wait," you say, "isn't this
+ same kind of font lookup done wildly commonly, like with every
+ `ft` request or `\f` escape sequence? {D} Yes! So there's good
+ news and bad news, and they're the same news. There is logic to
+ do exactly that--to disambiguate and resolve font mounting
+ positions and identifiers behind that request and that escape
+ sequence. It lives, duplicatively, in another function in a
+ different source file. {E} "But wait," you say, "why was this
+ function, a supporting function for request handlers that read
+ only space-delimited arguments, calling `token::
+ is_usable_as_delimiter()`?" Good question! I don't know, but
+ it's been doing that, modulus the renamings for which I fear I
+ am becoming notorious, at least since groff 1.22.3 in November
+ 2014. Within the request handlers, we aren't in a delimited
+ input context at all--as the GNU troff code base uses that
+ term--so it's a mystery.
+
+ `token::is_usable_as_delimiter()` used to be known as
+ `token::delimiter()`. I invite the reader to compare and
+ contrast the following.
+
+ https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/\
+ troff/node.cpp?h=1.22.3#n6163
+
+ https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/\
+ node.c?h=1.02#n4373
+
+ Why not just have `read_font_identifier()` call `select_font()`
+ over in "env.cpp"? Side effects. As with many GNU troff
+ functions, parsing logic is entangled with engine logic; see
+ Savannah #68240. So instead I copy the lexical analysis bits of
+ `select_font()` into this function, replacing most of it. How
+ can I be confident this will work well? Apart from a build
+ proceeding successfully and passing our test suite (now with 340
+ automated tests), the `select_font()` logic is a hotter code
+ path and, I suspect, always has been. After my last push
+ earlier this week, I gathered coverage statistics from a full
+ build with tests. `read_font_identifier() was hit 12,023 times.
+ That seems like a lot, right?
+
+ `select_font()` was hit 254,489 times. If you can't
+ successfully change fonts in GNU troff, a lot more people (and
+ test scripts of ours) will notice than can observe malfunctions
+ of the `uf`, `fschar`, `rfschar`, `special`, `fspecial`,
+ `fzoom`, `bd`, or `tkf` requests. (And, in fact, `bd` _doesn't_
+ work well. See Savannah #68256.) So if I have to violate DRY,
+ I'm cribbing the battle-hardened code. Moreover there's no
+ reason at all, on any principle, to be checking for delimiters
+ in this input context, as noted above.
+
+ Fixes <https://savannah.gnu.org/bugs/?68252>. The root cause of
+ the problem depends on what you consider the defect to be. As
+ noted above, the assertion came in with commit 8b83fd3dd8, 2
+ November, affecting groff 1.24.0. But the code path
+ _containing_ the assertion should never have been taken in the
+ first place by the request handlers for `bd` et al.--yet it has
+ been, ever since groff's birth. Until now.
+
2026-04-18 G. Branden Robinson <[email protected]>
* src/roff/troff/node.cpp (read_font_identifier): Trivially
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index d96e2908d..3d7029355 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -1331,6 +1331,10 @@ static void select_stroke_color_request() // .gcolor
static symbol P_symbol("P");
// Select font with name or mounting position `s`.
+//
+// TODO: This duplicates logic from node.cpp:read_font_identifier().
+// Refactor. Need read_font_mounting_position_or_identifier(), storing
+// the resolved mounting position in an argument.
void select_font(symbol s)
{
bool is_number = true;
diff --git a/src/roff/troff/node.cpp b/src/roff/troff/node.cpp
index 74dd1c549..ee85d5cbf 100644
--- a/src/roff/troff/node.cpp
+++ b/src/roff/troff/node.cpp
@@ -7002,15 +7002,52 @@ bool is_valid_font(int n)
// Read the next token and look it up as a font name or position number.
// Return lookup success. Store, in the supplied struct argument, the
// requested name or position, and the position actually resolved.
+//
+// TODO: This duplicates logic from env.cpp:select_font(). Refactor.
+// Need read_font_mounting_position_or_identifier(), storing the
+// resolved mounting position in an argument.
static bool read_font_identifier(font_lookup_info *finfo)
{
- int mp;
tok.skip_spaces();
- if (tok.is_usable_as_delimiter()) {
- symbol s = read_identifier(true /* want_diagnostic */);
+ // Painful. We read the whole argument as a symbol, then see if it's
+ // interpretable as an (unsigned) decimal integer. If is, we treat it
+ // as a mounting position. If not, we treat it as a font "name".
+ symbol s = read_identifier();
+ bool is_number = true;
+ assert(!s.is_null() && !s.is_empty());
+ if (!s.is_null() && !s.is_empty()) {
+ const char *p = s.contents();
+ assert(*p != 0 /* nullptr */);
+ // Silently ignore a leading minus sign so we can issue a range
+ // warning later.
+ if ((csdigit(*p)) || ('-' == *p))
+ p++;
+ for (; (p != 0 /* nullptr */) && (*p != '\0'); p++) {
+ if (!csdigit(*p)) {
+ is_number = false;
+ break;
+ }
+ }
+ }
+ if (is_number) {
+ errno = 0;
+ long val = strtol(s.contents(), NULL, 10);
+ if ((ERANGE == errno) || (val > INT_MAX) || (val < 0)) {
+ warning(WARN_RANGE, "font mounting position must be in range"
+ " 0..%1, got %2", INT_MAX, s.contents());
+ return false;
+ }
+ int mp = int(val);
+ if (!is_valid_font_mounting_position(mp)) {
+ warning(WARN_FONT, "no font mounted at position %1", mp);
+ return false;
+ }
+ finfo->position = curenv->get_family()->resolve(mp);
+ }
+ else {
finfo->requested_name = const_cast<char *>(s.contents());
if (!s.is_null()) {
- mp = mounting_position_of_font(s);
+ int mp = mounting_position_of_font(s);
if (mp < 0) {
mp = next_available_font_mounting_position();
if (mount_font_at_position(s, mp))
@@ -7019,11 +7056,6 @@ static bool read_font_identifier(font_lookup_info *finfo)
finfo->position = curenv->get_family()->resolve(mp);
}
}
- else if (read_integer(&mp)) {
- finfo->requested_position = mp;
- if (is_valid_font_mounting_position(mp))
- finfo->position = curenv->get_family()->resolve(mp);
- }
return (finfo->position != FONT_NOT_MOUNTED);
}
_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit