gbranden pushed a commit to branch master
in repository groff.

commit 3dc11a61aae17cddc336939e8b8374dbeb6edbc7
Author: G. Branden Robinson <[email protected]>
AuthorDate: Sat Apr 18 14:40:10 2026 -0500

    src/roff/troff/env.cpp: Fix Savannah #68252.
    
    * src/roff/troff/env.cpp (read_font_identifier): Rewrite to fix Savannah
      #68252, an assertion failure I introduced in commit 8b83fd3dd8, 2
      November, in
      "src/roff/troff/input.cpp:token::is_usable_as_delimiter()".
    
      Time to fire up the development mailing list critical homunculus and
      have a Socratic dialogue with it.
    
      "Rewrite?"  Why hit such a small thing with such a big hammer?  Why
      not just fix my stupid mistake of adding an assertion?  {A} I
      couldn't--wouldn't--just rip the assertion out, because it's in a
      hot-path function where much sanity checking takes place.  {B} This
      code implicates one of the more challenging parts of the *roff
      language grammar: (almost) anywhere you can read a font identifier
      like `B`, `TR`, or `MARIGOLD`, it's also valid to read a font mounting
      position, a nonnegative integer.  In *roff, nonnegative integers also
      happen to be valid identifier names!  You can have a register named
      `55` or a string/macro/diversion named `10` {hello, eqn(1)}.  {C} This
      function is called only by uncommonly used request handlers, those for
      `uf`, `fschar`, `rfschar`, `special`, `fspecial`, `fzoom`, `bd`, and
      `tkf`.  {It's a rare groff user who can say with confidence what _all_
      of those requests do.}  "But wait," you say, "isn't this same kind of
      font lookup done wildly commonly, like with every `ft` request or `\f`
      escape sequence?  {D} Yes!  So there's good news and bad news, and
      they're the same news.  There is logic to do exactly that--to
      disambiguate and resolve font mounting positions and identifiers
      behind that request and that escape sequence.  It lives,
      duplicatively, in another function in a different source file.  {E}
      "But wait," you say, "why was this function, a supporting function for
      request handlers that read only space-delimited arguments, calling
      `token::is_usable_as_delimiter()`?"  Good question!  I don't know, but
      it's been doing that, modulus the renamings for which I fear I am
      becoming notorious, at least since groff 1.22.3 in November 2014.
      Within the request handlers, we aren't in a delimited input context at
      all--as the GNU troff code base uses that term--so it's a mystery.
    
      `token::is_usable_as_delimiter()` used to be known as
              `token::delimiter()`.  I invite the reader to compare and 
contrast the
              following.
    
      https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/\
      troff/node.cpp?h=1.22.3#n6163
    
      https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/\
      node.c?h=1.02#n4373
    
      Why not just have `read_font_identifier()` call `select_font()` over
      in "env.cpp"?  Side effects.  As with many GNU troff functions,
      parsing logic is entangled with engine logic; see Savannah #68240.  So
      instead I copy the lexical analysis bits of `select_font()` into this
      function, replacing most of it.  How can I be confident this will work
      well?  Apart from a build proceeding successfully and passing our test
      suite (now with 340 automated tests), the `select_font()` logic is a
      hotter code path and, I suspect, always has been.  After my last push
      earlier this week, I gathered coverage statistics from a full build
      with tests.  `read_font_identifier() was hit 12,023 times.  That seems
      like a lot, right?
    
      `select_font()` was hit 254,489 times.  If you can't successfully
      change fonts in GNU troff, a lot more people (and test scripts of
      ours) will notice than can observe malfunctions of the `uf`, `fschar`,
      `rfschar`, `special`, `fspecial`, `fzoom`, `bd`, or `tkf` requests.
      (And, in fact, `bd` _doesn't_ work well.  See Savannah #68256.)  So if
      I have to violate DRY, I'm cribbing the battle-hardened code.
      Moreover there's no reason at all, on any principle, to be checking
      for delimiters in this input context, as noted above.
    
      Fixes <https://savannah.gnu.org/bugs/?68252>.  The root cause of the
      problem depends on what you consider the defect to be.  As noted
      above, the assertion came in with commit 8b83fd3dd8, 2 November,
      affecting groff 1.24.0.  But the code path _containing_ the assertion
      should never have been taken in the first place by the request
      handlers for `bd` et al.--yet it has been, ever since groff's birth.
      Until now.
---
 ChangeLog               | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
 src/roff/troff/env.cpp  |  4 +++
 src/roff/troff/node.cpp | 50 ++++++++++++++++++++++++------
 3 files changed, 127 insertions(+), 9 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 5b894462d..61ea297a6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,85 @@
+2026-04-18  G. Branden Robinson <[email protected]>
+
+       * src/roff/troff/env.cpp (read_font_identifier): Rewrite to fix
+       Savannah #68252, an assertion failure I introduced in commit
+       8b83fd3dd8, 2 November, in "src/roff/troff/input.cpp:token::
+       is_usable_as_delimiter()".
+
+       Time to fire up the development mailing list critical homunculus
+       and have a Socratic dialogue with it.
+
+       "Rewrite?"  Why hit such a small thing with such a big hammer?
+       Why not just fix my stupid mistake of adding an assertion?  {A}
+       I couldn't--wouldn't--just rip the assertion out, because it's
+       in a hot-path function where much sanity checking takes place.
+       {B} This code implicates one of the more challenging parts of
+       the *roff language grammar: (almost) anywhere you can read a
+       font identifier like `B`, `TR`, or `MARIGOLD`, it's also valid
+       to read a font mounting position, a nonnegative integer.  In
+       *roff, nonnegative integers also happen to be valid identifier
+       names!  You can have a register named `55` or a string/macro/
+       diversion named `10` {hello, eqn(1)}.  {C} This function is
+       called only by uncommonly used request handlers, those for `uf`,
+       `fschar`, `rfschar`, `special`, `fspecial`, `fzoom`, `bd`, and
+       `tkf`.  {It's a rare groff user who can say with confidence what
+       _all_ of those requests do.}  "But wait," you say, "isn't this
+       same kind of font lookup done wildly commonly, like with every
+       `ft` request or `\f` escape sequence?  {D} Yes!  So there's good
+       news and bad news, and they're the same news.  There is logic to
+       do exactly that--to disambiguate and resolve font mounting
+       positions and identifiers behind that request and that escape
+       sequence.  It lives, duplicatively, in another function in a
+       different source file.  {E} "But wait," you say, "why was this
+       function, a supporting function for request handlers that read
+       only space-delimited arguments, calling `token::
+       is_usable_as_delimiter()`?"  Good question!  I don't know, but
+       it's been doing that, modulus the renamings for which I fear I
+       am becoming notorious, at least since groff 1.22.3 in November
+       2014.  Within the request handlers, we aren't in a delimited
+       input context at all--as the GNU troff code base uses that
+       term--so it's a mystery.
+
+       `token::is_usable_as_delimiter()` used to be known as
+       `token::delimiter()`.  I invite the reader to compare and
+       contrast the following.
+
+       https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/\
+       troff/node.cpp?h=1.22.3#n6163
+
+       https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/\
+       node.c?h=1.02#n4373
+
+       Why not just have `read_font_identifier()` call `select_font()`
+       over in "env.cpp"?  Side effects.  As with many GNU troff
+       functions, parsing logic is entangled with engine logic; see
+       Savannah #68240.  So instead I copy the lexical analysis bits of
+       `select_font()` into this function, replacing most of it.  How
+       can I be confident this will work well?  Apart from a build
+       proceeding successfully and passing our test suite (now with 340
+       automated tests), the `select_font()` logic is a hotter code
+       path and, I suspect, always has been.  After my last push
+       earlier this week, I gathered coverage statistics from a full
+       build with tests.  `read_font_identifier() was hit 12,023 times.
+       That seems like a lot, right?
+
+       `select_font()` was hit 254,489 times.  If you can't
+       successfully change fonts in GNU troff, a lot more people (and
+       test scripts of ours) will notice than can observe malfunctions
+       of the `uf`, `fschar`, `rfschar`, `special`, `fspecial`,
+       `fzoom`, `bd`, or `tkf` requests.  (And, in fact, `bd` _doesn't_
+       work well.  See Savannah #68256.)  So if I have to violate DRY,
+       I'm cribbing the battle-hardened code.  Moreover there's no
+       reason at all, on any principle, to be checking for delimiters
+       in this input context, as noted above.
+
+       Fixes <https://savannah.gnu.org/bugs/?68252>.  The root cause of
+       the problem depends on what you consider the defect to be.  As
+       noted above, the assertion came in with commit 8b83fd3dd8, 2
+       November, affecting groff 1.24.0.  But the code path
+       _containing_ the assertion should never have been taken in the
+       first place by the request handlers for `bd` et al.--yet it has
+       been, ever since groff's birth.  Until now.
+
 2026-04-18  G. Branden Robinson <[email protected]>
 
        * src/roff/troff/node.cpp (read_font_identifier): Trivially
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index d96e2908d..3d7029355 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -1331,6 +1331,10 @@ static void select_stroke_color_request() // .gcolor
 static symbol P_symbol("P");
 
 // Select font with name or mounting position `s`.
+//
+// TODO: This duplicates logic from node.cpp:read_font_identifier().
+// Refactor.  Need read_font_mounting_position_or_identifier(), storing
+// the resolved mounting position in an argument.
 void select_font(symbol s)
 {
   bool is_number = true;
diff --git a/src/roff/troff/node.cpp b/src/roff/troff/node.cpp
index 74dd1c549..ee85d5cbf 100644
--- a/src/roff/troff/node.cpp
+++ b/src/roff/troff/node.cpp
@@ -7002,15 +7002,52 @@ bool is_valid_font(int n)
 // Read the next token and look it up as a font name or position number.
 // Return lookup success.  Store, in the supplied struct argument, the
 // requested name or position, and the position actually resolved.
+//
+// TODO: This duplicates logic from env.cpp:select_font().  Refactor.
+// Need read_font_mounting_position_or_identifier(), storing the
+// resolved mounting position in an argument.
 static bool read_font_identifier(font_lookup_info *finfo)
 {
-  int mp;
   tok.skip_spaces();
-  if (tok.is_usable_as_delimiter()) {
-    symbol s = read_identifier(true /* want_diagnostic */);
+  // Painful.  We read the whole argument as a symbol, then see if it's
+  // interpretable as an (unsigned) decimal integer.  If is, we treat it
+  // as a mounting position.  If not, we treat it as a font "name".
+  symbol s = read_identifier();
+  bool is_number = true;
+  assert(!s.is_null() && !s.is_empty());
+  if (!s.is_null() && !s.is_empty()) {
+    const char *p = s.contents();
+    assert(*p != 0 /* nullptr */);
+    // Silently ignore a leading minus sign so we can issue a range
+    // warning later.
+    if ((csdigit(*p)) || ('-' == *p))
+      p++;
+    for (; (p != 0 /* nullptr */) && (*p != '\0'); p++) {
+      if (!csdigit(*p)) {
+       is_number = false;
+       break;
+      }
+    }
+  }
+  if (is_number) {
+    errno = 0;
+    long val = strtol(s.contents(), NULL, 10);
+    if ((ERANGE == errno) || (val > INT_MAX) || (val < 0)) {
+      warning(WARN_RANGE, "font mounting position must be in range"
+             " 0..%1, got %2", INT_MAX, s.contents());
+      return false;
+    }
+    int mp = int(val);
+    if (!is_valid_font_mounting_position(mp)) {
+      warning(WARN_FONT, "no font mounted at position %1", mp);
+      return false;
+    }
+    finfo->position = curenv->get_family()->resolve(mp);
+  }
+  else {
     finfo->requested_name = const_cast<char *>(s.contents());
     if (!s.is_null()) {
-      mp = mounting_position_of_font(s);
+      int mp = mounting_position_of_font(s);
       if (mp < 0) {
        mp = next_available_font_mounting_position();
        if (mount_font_at_position(s, mp))
@@ -7019,11 +7056,6 @@ static bool read_font_identifier(font_lookup_info *finfo)
       finfo->position = curenv->get_family()->resolve(mp);
     }
   }
-  else if (read_integer(&mp)) {
-    finfo->requested_position = mp;
-    if (is_valid_font_mounting_position(mp))
-      finfo->position = curenv->get_family()->resolve(mp);
-  }
   return (finfo->position != FONT_NOT_MOUNTED);
 }
 

_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

Reply via email to