[bug #67571] [troff] `class` request works or not depending on where in the input it's called

G. Branden Robinson Mon, 17 Nov 2025 13:58:02 -0800

Follow-up Comment #20, bug #67571 (group groff):

Looks like I get to pivot to a **third** position.


Here's a summary of my working copy at a point of bisection.  I'll explain why
I'm bisecting in a moment.


$ git log --oneline --reverse origin..HEAD
632b9d7b6 ChangeLog: Fix component annotations.
75f3d1db3 [doc,man]: Clarify "Other differences" discussion.
f290bb60d src/roff/troff/input.cpp: Fix grammar in comment.
b111f6f08 src/libs/libgroff/glyphuni.cpp: Correct comment.
3c770e337 [troff]: Add `.class` read-only Boolean register.
9f5e58272 [troff]: Track character class defn locations.
6ed81442c (refs/bisect/good-6ed81442c60835e544de183cee202435046bd17b) [troff]:
Describe character classes better (1/2).
71b13f801 [troff]: Describe character classes better (2/2).
2ddfba4cb (refs/bisect/good-2ddfba4cb71b4fb749f871c101aa1c20d8edd21c) [groff]:
Regression-test Savannah #67571.
05eae8986 (HEAD, refs/bisect/bad) REWRITE src/roff/troff/input.cpp: Annotate
mystery code.


The changes to the formatter between HEAD on Savannah up to 2ddfb are mostly
not material, but here they are.


$ git diff origin 2ddfb -- src/roff/troff/ | cat
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 339695d2a..b1b236e92 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -2907,21 +2907,26 @@ const char *token::description()
   case TOKEN_SPACE:
     return "a space";
   case TOKEN_SPECIAL_CHAR:
-    // We normally using apostrophes for quotation in diagnostic
-    // messages, but many special character names contain them.  Fall
-    // back to double quotes if this one does.  A user-defined special
-    // character name could contain both characters; we expect such
-    // users to lie comfortably in the bed they made for themselves.
+    // We normally use apostrophes for quotation in diagnostic messages,
+    // but many special character names contain them.  Fall back to
+    // double quotes if this one does.  A user-defined special character
+    // name could contain both characters; we expect such users to lie
+    // comfortably in the bed they made for themselves.
     {
       const char *sc = nm.contents();
       char qc = '\'';
       if (strchr(sc, '\'') != 0 /* nullptr */)
        qc = '"';
       // TODO: This truncates the names of impractically long special
-      // character names.  Do something about that.  (The truncation is
-      // visually indicated by the absence of a closing quotation mark.)
-      (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
-                     qc);
+      // character or character class names.  Do something about that.
+      // (The truncation is visually indicated by the absence of a
+      // closing quotation mark.)
+      if (using_character_classes && tok.get_char()->is_class())
+       (void) snprintf(buf, maxstr, "character class %c%s%c", qc, sc,
+                       qc);
+      else
+       (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
+                       qc);
       return buf;
     }
   case TOKEN_SPREAD:
@@ -3922,6 +3927,14 @@ void macro::print_size()
   errprint("%1", len);
 }
 
+// Use this only for zero-length macros associated with charinfo objects
+// that are character classes.
+void macro::dump()
+{
+  if (filename != 0 /* nullptr */)
+    errprint("file name: \"%1\", line number: %2\n", filename, lineno);
+}
+
 void macro::json_dump()
 {
   bool need_comma = false;
@@ -4987,8 +5000,8 @@ static void print_character_request()
     ci = tok.get_char(false /* required */,
                      true /* suppress creation */);
     if (!tok.is_character()) {
-      error("character report request expects characters as arguments;"
-           " got %1", tok.description());
+      error("character report request expects characters or character"
+           " classes as arguments; got %1", tok.description());
       break;
     }
     if (0 /* nullptr */ == ci) {
@@ -8522,6 +8535,10 @@ void define_class()
     return;
   }
   charinfo *ci = get_charinfo(nm);
+  // Assign the charinfo an empty macro as a hack to record the
+  // file:line location of its definition.
+  macro *m = new macro;
+  (void) ci->set_macro(m);
   charinfo *child1 = 0 /* nullptr */, *child2 = 0 /* nullptr */;
   while (!tok.is_newline() && !tok.is_eof()) {
     tok.skip();
@@ -8925,6 +8942,16 @@ const char *break_flag_reg::get_string()
   return i_to_a(input_stack::get_break_flag());
 }
 
+class character_classes_in_use_reg : public reg {
+public:
+  const char *get_string();
+};
+
+const char *character_classes_in_use_reg::get_string()
+{
+  return i_to_a(using_character_classes);
+}
+
 class enclosing_want_att_compat_reg : public reg {
 public:
   const char *get_string();
@@ -9957,6 +9984,7 @@ void init_input_requests()
   register_dictionary.define(".$", new nargs_reg);
   register_dictionary.define(".br", new break_flag_reg);
   register_dictionary.define(".C", new
readonly_boolean_register(&want_att_compat));
+  register_dictionary.define(".class", new character_classes_in_use_reg);
   register_dictionary.define(".cp", new enclosing_want_att_compat_reg);
   register_dictionary.define(".O", new variable_reg(&suppression_level));
   register_dictionary.define(".c", new lineno_reg);
@@ -10700,92 +10728,139 @@ bool charinfo::contains(charinfo *, bool)
 
 void charinfo::dump()
 {
-  if (translation != 0 /* nullptr */)
-    errprint("  is translated\n");
-  else
-    errprint("  is not translated\n");
-  if (mac != 0 /* nullptr */) {
-    errprint("  has a macro: ");
-    mac->json_dump();
+  if (is_class()) {
+    std::vector<std::pair<int, int> >::const_iterator ranges_iter;
+    ranges_iter = ranges.begin();
+    assert(mac != 0 /* nullptr */);
+    errprint("  defined at: ");
+    mac->dump();
+    fflush(stderr);
+    errprint("  contains ranges: ");
+    const size_t buflen = 8; // "U+" + four/five hex digits + '\0'
+    int range_begin = 0;
+    int range_end = 0;
+    char beg_hexbuf[buflen];
+    char end_hexbuf[buflen];
+    (void) memset(beg_hexbuf, '\0', buflen);
+    (void) memset(end_hexbuf, '\0', buflen);
+    bool has_ranges = false;
+    while (ranges_iter != ranges.end()) {
+      has_ranges = true;
+      range_begin = ranges_iter->first;
+      range_end = ranges_iter->second;
+      (void) snprintf(beg_hexbuf, buflen, "U+%.4X", range_begin);
+      (void) snprintf(end_hexbuf, buflen, "U+%.4X", range_end);
+      // TODO: comma-separate?  JSON list?
+      if (range_begin == range_end)
+       errprint("%1 ", beg_hexbuf, end_hexbuf);
+      else
+       errprint("%1-%2 ", beg_hexbuf, end_hexbuf);
+      ++ranges_iter;
+    }
+    if (!has_ranges)
+      errprint("(none)");
+    errprint("\n");
+    errprint("  contains nested classes: ");
+    std::vector<charinfo *>::const_iterator nested_iter;
+    nested_iter = nested_classes.begin();
+    bool has_nested_classes = false;
+    while (nested_iter != nested_classes.end()) {
+      has_nested_classes = true;
+      // TODO: Here's where JSON would really pay off.
+      (*nested_iter)->dump();
+    }
+    if (!has_nested_classes)
+      errprint("(none)");
     errprint("\n");
   }
-  else
-    errprint("  does not have a macro\n");
-  errprint("  special translation: %1\n",
-          static_cast<int>(special_translation));
-  errprint("  hyphenation code: %1\n",
-          static_cast<int>(hyphenation_code));
-  errprint("  flags: %1 (", flags);
-  if (0U == flags)
-    errprint("none)\n");
   else {
-    char none[] = { '\0' };
-    char comma[] = { ',', ' ', '\0' };
-    char *separator = none;
-    if (flags & ENDS_SENTENCE) {
-      errprint("%1ends sentence", separator);
-      separator = comma;
-    }
-    if (flags & ALLOWS_BREAK_BEFORE) {
-      errprint("%1allows break before", separator);
-      separator = comma;
-    }
-    if (flags & ALLOWS_BREAK_AFTER) {
-      errprint("%1allows break after", separator);
-      separator = comma;
-    }
-    if (flags & OVERLAPS_HORIZONTALLY) {
-      errprint("%1overlaps horizontally", separator);
-      separator = comma;
-    }
-    if (flags & OVERLAPS_VERTICALLY) {
-      errprint("%1overlaps vertically", separator);
-      separator = comma;
-    }
-    if (flags & IS_TRANSPARENT_TO_END_OF_SENTENCE) {
-      errprint("%1is transparent to end of sentence", separator);
-      separator = comma;
-    }
-    if (flags & IGNORES_SURROUNDING_HYPHENATION_CODES) {
-      errprint("%1ignores surrounding hyphenation codes", separator);
-      separator = comma;
-    }
-    if (flags & PROHIBITS_BREAK_BEFORE) {
-      errprint("%1prohibits break before", separator);
-      separator = comma;
-    }
-    if (flags & PROHIBITS_BREAK_AFTER) {
-      errprint("%1prohibits break after", separator);
-      separator = comma;
-    }
-    if (flags & IS_INTERWORD_SPACE) {
-      errprint("%1is interword space", separator);
-      separator = comma;
-    }
-    errprint(")\n");
-  }
-  errprint("  asciify code: %1\n", static_cast<int>(asciify_code));
-  errprint("  ASCII code: %1\n", static_cast<int>(ascii_code));
-  // Also see node.cpp::glyph_node::asciify().
-  int mapping = get_unicode_mapping();
-  if (mapping >= 0) {
-    const size_t buflen = 6; // enough for five hex digits + '\0'
-    char hexbuf[buflen];
-    (void) memset(hexbuf, '\0', buflen);
-    (void) snprintf(hexbuf, buflen, "%.4X", mapping);
-    errprint("  Unicode mapping: U+%1\n", hexbuf);
+    if (translation != 0 /* nullptr */)
+      errprint("  is translated\n");
+    else
+      errprint("  is not translated\n");
+    if (mac != 0 /* nullptr */) {
+      errprint("  has a macro: ");
+      mac->json_dump();
+      errprint("\n");
+    }
+    else
+      errprint("  does not have a macro\n");
+    errprint("  special translation: %1\n",
+            static_cast<int>(special_translation));
+    errprint("  hyphenation code: %1\n",
+            static_cast<int>(hyphenation_code));
+    errprint("  flags: %1 (", flags);
+    if (0U == flags)
+      errprint("none)\n");
+    else {
+      char none[] = { '\0' };
+      char comma[] = { ',', ' ', '\0' };
+      char *separator = none;
+      if (flags & ENDS_SENTENCE) {
+       errprint("%1ends sentence", separator);
+       separator = comma;
+      }
+      if (flags & ALLOWS_BREAK_BEFORE) {
+       errprint("%1allows break before", separator);
+       separator = comma;
+      }
+      if (flags & ALLOWS_BREAK_AFTER) {
+       errprint("%1allows break after", separator);
+       separator = comma;
+      }
+      if (flags & OVERLAPS_HORIZONTALLY) {
+       errprint("%1overlaps horizontally", separator);
+       separator = comma;
+      }
+      if (flags & OVERLAPS_VERTICALLY) {
+       errprint("%1overlaps vertically", separator);
+       separator = comma;
+      }
+      if (flags & IS_TRANSPARENT_TO_END_OF_SENTENCE) {
+       errprint("%1is transparent to end of sentence", separator);
+       separator = comma;
+      }
+      if (flags & IGNORES_SURROUNDING_HYPHENATION_CODES) {
+       errprint("%1ignores surrounding hyphenation codes", separator);
+       separator = comma;
+      }
+      if (flags & PROHIBITS_BREAK_BEFORE) {
+       errprint("%1prohibits break before", separator);
+       separator = comma;
+      }
+      if (flags & PROHIBITS_BREAK_AFTER) {
+       errprint("%1prohibits break after", separator);
+       separator = comma;
+      }
+      if (flags & IS_INTERWORD_SPACE) {
+       errprint("%1is interword space", separator);
+       separator = comma;
+      }
+      errprint(")\n");
+    }
+    errprint("  asciify code: %1\n", static_cast<int>(asciify_code));
+    errprint("  ASCII code: %1\n", static_cast<int>(ascii_code));
+    // Also see node.cpp::glyph_node::asciify().
+    int mapping = get_unicode_mapping();
+    if (mapping >= 0) {
+      const size_t buflen = 6; // enough for five hex digits + '\0'
+      char hexbuf[buflen];
+      (void) memset(hexbuf, '\0', buflen);
+      (void) snprintf(hexbuf, buflen, "%.4X", mapping);
+      errprint("  Unicode mapping: U+%1\n", hexbuf);
+    }
+    else
+      errprint("  Unicode mapping: none (%1)\n", mapping);
+    errprint("  is%1 found\n", is_not_found ? " not" : "");
+    errprint("  is%1 transparently translatable\n",
+            is_transparently_translatable ? "" : " not");
+    errprint("  is%1 translatable as input\n",
+            translatable_as_input ? "" : " not");
+    const char *modestr = character_mode_description(mode);
+    if (strcmp(modestr, "") == 0)
+      modestr =" normal";
+    errprint("  mode:%1\n", modestr);
   }
-  else
-    errprint("  Unicode mapping: none (%1)\n", mapping);
-  errprint("  is%1 found\n", is_not_found ? " not" : "");
-  errprint("  is%1 transparently translatable\n",
-          is_transparently_translatable ? "" : " not");
-  errprint("  is%1 translatable as input\n",
-          translatable_as_input ? "" : " not");
-  const char *modestr = character_mode_description(mode);
-  if (strcmp(modestr, "") == 0)
-    modestr =" normal";
-  errprint("  mode:%1\n", modestr);
   fflush(stderr);
 }
 
diff --git a/src/roff/troff/request.h b/src/roff/troff/request.h
index ce9967cf2..dd7f92428 100644
--- a/src/roff/troff/request.h
+++ b/src/roff/troff/request.h
@@ -70,6 +70,7 @@ public:
   bool is_diversion();
   bool is_string();
   void clear_string_flag();
+  void dump();
   void json_dump();
   friend class string_iterator;
   friend void chop_macro();


This looks like **way** more churn than it really is because of the changes to
dumping, so that `pchar` on a character class looks different from `pchar` on
a special character.

(I now think I should probably revert my addition of a `.class` register.)

But here's what I'm writing about.  That last change, the one marked
"REWRITE", is a doozy.  It _had_ been just a comment, which I pasted into one
of the recent highly active Savannah tickets.  But I changed it to do this:


$ git show
commit 05eae8986ff3800b2616c62e9b617664914c76b9 (HEAD, refs/bisect/bad)
Author: G. Branden Robinson <[email protected]>
Date:   Fri Nov 14 16:56:55 2025 -0600

    REWRITE src/roff/troff/input.cpp: Annotate mystery code.

diff --git a/src/roff/troff/charinfo.h b/src/roff/troff/charinfo.h
index 797a7f5f2..0ad68f70c 100644
--- a/src/roff/troff/charinfo.h
+++ b/src/roff/troff/charinfo.h
@@ -285,7 +285,6 @@ inline symbol *charinfo::get_symbol()
 
 inline void charinfo::add_to_class(int c)
 {
-  using_character_classes = true;
   // TODO ranges cumbersome for single characters?
   ranges.push_back(std::pair<int, int>(c, c));
 }
@@ -293,13 +292,11 @@ inline void charinfo::add_to_class(int c)
 inline void charinfo::add_to_class(int lo,
                                   int hi)
 {
-  using_character_classes = true;
   ranges.push_back(std::pair<int, int>(lo, hi));
 }
 
 inline void charinfo::add_to_class(charinfo *ci)
 {
-  using_character_classes = true;
   nested_classes.push_back(ci);
 }
 
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index b1b236e92..f79325fb9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -8635,6 +8635,7 @@ void define_class()
     skip_line();
     return;
   }
+  using_character_classes = true;
   (void) char_class_dictionary.lookup(nm, ci);
   skip_line();
 }
@@ -10596,7 +10597,6 @@ void get_flags()
     assert(!s.is_null());
     ci->get_flags();
   }
-  using_character_classes = false;
 }
 
 // Get the union of all flags affecting this charinfo.


This change shot performance right to hell: normally, using all the cores on
my dev box, I can run the entire test suite in 8 to 9 seconds.

With just this patch above, the tests take more like 38 seconds.

Deri would kill me.

I still think getting rid of this global bit of parser state is the right
idea, because it's bound up with the issue of this report: "`class` request
works or not depending on where in the input it's called", which I could
re-express as "character class interpolations should work the same regardless
of where in the input they occur".

I already have a patch in development to make the formatter complain anytime a
character class gets used where a simple special character is expected.

Observe:


$ git stash show -p 0 | cat
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index b1e1fb321..e4ce56c4a 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -1676,6 +1676,11 @@ void margin_character()
   while (tok.is_space())
     tok.next();
   charinfo *ci = tok.read_troff_character();
+  if (using_character_classes && ci->is_class()) {
+    error("cannot use %1 as a margin character", tok.description());
+    skip_line();
+    return;
+  }
   if (ci != 0 /* nullptr */) {
     // Call tok.next() only after making the node so that
     // .mc \s+9\(br\s0 works.
@@ -3849,6 +3854,12 @@ static void add_hyphenation_exceptions()
     while (i < WORD_MAX && !tok.is_space() && !tok.is_newline()
           && !tok.is_eof()) {
       charinfo *ci = tok.read_troff_character(true /* required */);
+      if (using_character_classes && ci->is_class()) {
+       error("cannot use %1 in a hyphenation exception word",
+             tok.description());
+       skip_line();
+       return;
+      }
       if (0 /* nullptr */ == ci) {
        error("%1 has no associated character information(!)",
              tok.description());
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 15d0a8495..166cf1752 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1667,6 +1667,12 @@ node *do_overstrike() // \o
     }
     else {
       charinfo *ci = tok.read_troff_character(true /* required */);
+      if (using_character_classes && ci->is_class()) {
+       error("cannot use %1 in overstrike escape sequence",
+             tok.description());
+       delete osnode;
+       return 0 /* nullptr */;
+      }
       if (ci != 0 /* nullptr */) {
        node *n = curenv->make_char_node(ci);
        if (n != 0 /* nullptr */)
@@ -1720,6 +1726,12 @@ static node *do_bracket() // \b
        && (want_att_compat || input_stack::get_level() == start_level))
       break;
     charinfo *ci = tok.read_troff_character(true /* required */);
+    if (using_character_classes && ci->is_class()) {
+      error("cannot use %1 in bracket-building escape sequence",
+           tok.description());
+      delete bracketnode;
+      return 0 /* nullptr */;
+    }
     if (ci != 0 /* nullptr */) {
       node *n = curenv->make_char_node(ci);
       if (n != 0 /* nullptr */)
@@ -2616,6 +2628,11 @@ void token::next()
            nd = new zero_width_node(nd);
          else {
            charinfo *ci = read_troff_character(true /* required */);
+           if (using_character_classes && ci->is_class()) {
+             error("cannot use %1 as argument to zero-width escape"
+                   " sequence", tok.description());
+             return;
+           }
            if (0 /* nullptr */ == ci)
              break;
            node *gn = curenv->make_char_node(ci);
@@ -2922,13 +2939,15 @@ const char *token::description()
       // character or character class names.  Do something about that.
       // (The truncation is visually indicated by the absence of a
       // closing quotation mark.)
-      if (using_character_classes
-         && tok.read_troff_character()->is_class())
-       (void) snprintf(buf, maxstr, "character class %c%s%c", qc, sc,
-                       qc);
-      else
-       (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
-                       qc);
+      static const char special_character[] = "special character";
+      static const char character_class[] = "character class";
+      const char *type = special_character;
+      if (using_character_classes) {
+       charinfo *ci = get_charinfo(nm);
+        if ((ci != 0 /* nullptr */) && ci->is_class())
+         type = character_class;
+      }
+      (void) snprintf(buf, maxstr, "%s %c%s%c", type, qc, sc, qc);
       return buf;
     }
   case TOKEN_SPREAD:
@@ -3557,12 +3576,14 @@ void process_input_stack()
                tok.description());
       else {
        reading_beginning_of_input_line = false;
+       debug("GBR1: calling process() on %1", tok.description());
        tok.process();
       }
       break;
     default:
       {
        reading_beginning_of_input_line = false;
+       debug("GBR2: calling process() on %1", tok.description());
        tok.process();
        break;
       }
@@ -4911,6 +4932,7 @@ void define_character(char_mode mode, const char
*font_name)
     skip_line();
     return;
   }
+  // TODO: If `ci` is already a character class, clobber it.
   if (font_name != 0 /* nullptr */) {
     string s(font_name);
     s += ' ';
@@ -5042,6 +5064,7 @@ static void remove_character()
        charinfo *ci = tok.read_troff_character(true /* required */,
                                                true /* suppress
                                                        creation */);
+       // TODO: If `ci` is a character class, clobber it.
        if (0 /* nullptr */ == ci) {
          if (!tok.is_indexed_character())
            warning(WARN_CHAR, "%1 is not defined", tok.description());
@@ -5919,6 +5942,11 @@ static bool get_line_arg(units *n, unsigned char si,
charinfo **cip)
     if (!(start_token == tok
          && input_stack::get_level() == start_level)) {
       *cip = tok.read_troff_character(true /* required */);
+      if (using_character_classes && (*cip)->is_class()) {
+       error("cannot use %1 in line-drawing escape sequence",
+             tok.description());
+       return false;
+      }
       tok.next();
     }
     if (!(start_token == tok
@@ -6229,6 +6257,7 @@ static void do_width() // \w
     if (tok == start_token
        && (want_att_compat || input_stack::get_level() == start_level))
       break;
+    debug("GBR3: calling process() on %1", tok.description());
     tok.process();
   }
   env.wrap_up_tab();
@@ -6274,8 +6303,10 @@ void read_title_parts(node **part, hunits *part_width)
       if ((page_character != 0 /* nullptr */)
          && (tok.read_troff_character() == page_character))
        interpolate_register(percent_symbol, 0);
-      else
+      else {
+       debug("GBR4: calling process() on %1", tok.description());
        tok.process();
+      }
       tok.next();
     }
     curenv->wrap_up_tab();
@@ -6909,6 +6940,7 @@ static bool are_comparands_equal()
          && (want_att_compat
              || input_stack::get_level() == delim_level))
         break;
+      debug("GBR5: calling process() on %1", tok.description());
       tok.process();
     }
     curenv = &env2;
@@ -8639,6 +8671,7 @@ void define_class()
     skip_line();
     return;
   }
+  debug("GBR: now using character classes");
   using_character_classes = true;
   (void) char_class_dictionary.lookup(nm, ci);
   skip_line();
@@ -8839,7 +8872,19 @@ void token::process()
     curenv->space();
     break;
   case TOKEN_SPECIAL_CHAR:
-    curenv->add_char(get_charinfo(nm));
+    {
+      charinfo *ci = get_charinfo(nm);
+      if (!using_character_classes) {
+       debug("GBR: token:process(): not using character classes");
+       curenv->add_char(get_charinfo(nm));
+      }
+      else if ((ci != 0 /* nullptr */) && !ci->is_class()) {
+       debug("GBR: token:process(): using character classes, but special
character is not a character class");
+       curenv->add_char(get_charinfo(nm));
+      }
+      else
+       error("cannot interpolate %1", description());
+    }
     break;
   case TOKEN_SPREAD:
     curenv->spread();
@@ -10081,6 +10126,7 @@ node *charinfo_to_node_list(charinfo *ci, const
environment *envp)
       break;
     }
     else
+      debug("GBR6: calling process() on %1", tok.description());
       tok.process();
   }
   node *n = curenv->extract_output_line();


This work was only about 50% done.

I think I need to finish up that work and see if the addition of these
"ci->is_class()" guards restores acceptable performance.  I suspect what's
happening is that character classes are getting traversed into and searched
when they need not be.  The order in which parallelized tests eventually
finish supports this hypothesis.


PASS: src/utils/grog/tests/smoke-test.sh
PASS: src/roff/groff/tests/do-not-loop-infinitely-when-breaking-cjk.sh
PASS: src/roff/groff/tests/check-delimiter-validity.sh
PASS: contrib/hdtbl/examples/test-hdtbl.sh
PASS: tmac/tests/localization-works.sh


These guys reliably trail the pack as of this commit.
"check-delimiter-validity.sh" is expected to be slow, because it launches the
formatter dozens or hundreds of times in an even now not quite exhaustive
exploration of the available escape sequence delimiter space.  _grog_'s smoke
test has always been slow because it's a lot of Perl wrapped around running
_groff_ several times.

But the other 3 all hit the CJK macro files.  Two for, I suspect, obvious
reasons, and "test-hdtbl.sh" because it prints (empty) code charts for the
abstract CJK faces contributed by TANAKA Takuji.  So all 3 of them definitely
exercise code paths involving the character class dictionary.

So, bottom line, "using_character_classes" is working _both_ as a performance
optimizer **and** as a parser-state-tweaking bit.

I don't know if the former was intended, but I now think I can scotch the
variable and simplify these pending


if (using_character_classes && ci->is_class()) {


tests to just


if (ci->is_class()) {


...and if that's _still_ crappily slow, I can make `charinfo::is_class` either
ask the `char_class_dictionary` object if it's populated, or "memoize" (I've
always hated that word, though not the technique) the fact thereof.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67571>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #67571] [troff] `class` request works or not depending on where in the input it's called

Reply via email to