Follow-up Comment #20, bug #67571 (group groff): Looks like I get to pivot to a **third** position.
Here's a summary of my working copy at a point of bisection. I'll explain why
I'm bisecting in a moment.
$ git log --oneline --reverse origin..HEAD
632b9d7b6 ChangeLog: Fix component annotations.
75f3d1db3 [doc,man]: Clarify "Other differences" discussion.
f290bb60d src/roff/troff/input.cpp: Fix grammar in comment.
b111f6f08 src/libs/libgroff/glyphuni.cpp: Correct comment.
3c770e337 [troff]: Add `.class` read-only Boolean register.
9f5e58272 [troff]: Track character class defn locations.
6ed81442c (refs/bisect/good-6ed81442c60835e544de183cee202435046bd17b) [troff]:
Describe character classes better (1/2).
71b13f801 [troff]: Describe character classes better (2/2).
2ddfba4cb (refs/bisect/good-2ddfba4cb71b4fb749f871c101aa1c20d8edd21c) [groff]:
Regression-test Savannah #67571.
05eae8986 (HEAD, refs/bisect/bad) REWRITE src/roff/troff/input.cpp: Annotate
mystery code.
The changes to the formatter between HEAD on Savannah up to 2ddfb are mostly
not material, but here they are.
$ git diff origin 2ddfb -- src/roff/troff/ | cat
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 339695d2a..b1b236e92 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -2907,21 +2907,26 @@ const char *token::description()
case TOKEN_SPACE:
return "a space";
case TOKEN_SPECIAL_CHAR:
- // We normally using apostrophes for quotation in diagnostic
- // messages, but many special character names contain them. Fall
- // back to double quotes if this one does. A user-defined special
- // character name could contain both characters; we expect such
- // users to lie comfortably in the bed they made for themselves.
+ // We normally use apostrophes for quotation in diagnostic messages,
+ // but many special character names contain them. Fall back to
+ // double quotes if this one does. A user-defined special character
+ // name could contain both characters; we expect such users to lie
+ // comfortably in the bed they made for themselves.
{
const char *sc = nm.contents();
char qc = '\'';
if (strchr(sc, '\'') != 0 /* nullptr */)
qc = '"';
// TODO: This truncates the names of impractically long special
- // character names. Do something about that. (The truncation is
- // visually indicated by the absence of a closing quotation mark.)
- (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
- qc);
+ // character or character class names. Do something about that.
+ // (The truncation is visually indicated by the absence of a
+ // closing quotation mark.)
+ if (using_character_classes && tok.get_char()->is_class())
+ (void) snprintf(buf, maxstr, "character class %c%s%c", qc, sc,
+ qc);
+ else
+ (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
+ qc);
return buf;
}
case TOKEN_SPREAD:
@@ -3922,6 +3927,14 @@ void macro::print_size()
errprint("%1", len);
}
+// Use this only for zero-length macros associated with charinfo objects
+// that are character classes.
+void macro::dump()
+{
+ if (filename != 0 /* nullptr */)
+ errprint("file name: \"%1\", line number: %2\n", filename, lineno);
+}
+
void macro::json_dump()
{
bool need_comma = false;
@@ -4987,8 +5000,8 @@ static void print_character_request()
ci = tok.get_char(false /* required */,
true /* suppress creation */);
if (!tok.is_character()) {
- error("character report request expects characters as arguments;"
- " got %1", tok.description());
+ error("character report request expects characters or character"
+ " classes as arguments; got %1", tok.description());
break;
}
if (0 /* nullptr */ == ci) {
@@ -8522,6 +8535,10 @@ void define_class()
return;
}
charinfo *ci = get_charinfo(nm);
+ // Assign the charinfo an empty macro as a hack to record the
+ // file:line location of its definition.
+ macro *m = new macro;
+ (void) ci->set_macro(m);
charinfo *child1 = 0 /* nullptr */, *child2 = 0 /* nullptr */;
while (!tok.is_newline() && !tok.is_eof()) {
tok.skip();
@@ -8925,6 +8942,16 @@ const char *break_flag_reg::get_string()
return i_to_a(input_stack::get_break_flag());
}
+class character_classes_in_use_reg : public reg {
+public:
+ const char *get_string();
+};
+
+const char *character_classes_in_use_reg::get_string()
+{
+ return i_to_a(using_character_classes);
+}
+
class enclosing_want_att_compat_reg : public reg {
public:
const char *get_string();
@@ -9957,6 +9984,7 @@ void init_input_requests()
register_dictionary.define(".$", new nargs_reg);
register_dictionary.define(".br", new break_flag_reg);
register_dictionary.define(".C", new
readonly_boolean_register(&want_att_compat));
+ register_dictionary.define(".class", new character_classes_in_use_reg);
register_dictionary.define(".cp", new enclosing_want_att_compat_reg);
register_dictionary.define(".O", new variable_reg(&suppression_level));
register_dictionary.define(".c", new lineno_reg);
@@ -10700,92 +10728,139 @@ bool charinfo::contains(charinfo *, bool)
void charinfo::dump()
{
- if (translation != 0 /* nullptr */)
- errprint(" is translated\n");
- else
- errprint(" is not translated\n");
- if (mac != 0 /* nullptr */) {
- errprint(" has a macro: ");
- mac->json_dump();
+ if (is_class()) {
+ std::vector<std::pair<int, int> >::const_iterator ranges_iter;
+ ranges_iter = ranges.begin();
+ assert(mac != 0 /* nullptr */);
+ errprint(" defined at: ");
+ mac->dump();
+ fflush(stderr);
+ errprint(" contains ranges: ");
+ const size_t buflen = 8; // "U+" + four/five hex digits + '\0'
+ int range_begin = 0;
+ int range_end = 0;
+ char beg_hexbuf[buflen];
+ char end_hexbuf[buflen];
+ (void) memset(beg_hexbuf, '\0', buflen);
+ (void) memset(end_hexbuf, '\0', buflen);
+ bool has_ranges = false;
+ while (ranges_iter != ranges.end()) {
+ has_ranges = true;
+ range_begin = ranges_iter->first;
+ range_end = ranges_iter->second;
+ (void) snprintf(beg_hexbuf, buflen, "U+%.4X", range_begin);
+ (void) snprintf(end_hexbuf, buflen, "U+%.4X", range_end);
+ // TODO: comma-separate? JSON list?
+ if (range_begin == range_end)
+ errprint("%1 ", beg_hexbuf, end_hexbuf);
+ else
+ errprint("%1-%2 ", beg_hexbuf, end_hexbuf);
+ ++ranges_iter;
+ }
+ if (!has_ranges)
+ errprint("(none)");
+ errprint("\n");
+ errprint(" contains nested classes: ");
+ std::vector<charinfo *>::const_iterator nested_iter;
+ nested_iter = nested_classes.begin();
+ bool has_nested_classes = false;
+ while (nested_iter != nested_classes.end()) {
+ has_nested_classes = true;
+ // TODO: Here's where JSON would really pay off.
+ (*nested_iter)->dump();
+ }
+ if (!has_nested_classes)
+ errprint("(none)");
errprint("\n");
}
- else
- errprint(" does not have a macro\n");
- errprint(" special translation: %1\n",
- static_cast<int>(special_translation));
- errprint(" hyphenation code: %1\n",
- static_cast<int>(hyphenation_code));
- errprint(" flags: %1 (", flags);
- if (0U == flags)
- errprint("none)\n");
else {
- char none[] = { '\0' };
- char comma[] = { ',', ' ', '\0' };
- char *separator = none;
- if (flags & ENDS_SENTENCE) {
- errprint("%1ends sentence", separator);
- separator = comma;
- }
- if (flags & ALLOWS_BREAK_BEFORE) {
- errprint("%1allows break before", separator);
- separator = comma;
- }
- if (flags & ALLOWS_BREAK_AFTER) {
- errprint("%1allows break after", separator);
- separator = comma;
- }
- if (flags & OVERLAPS_HORIZONTALLY) {
- errprint("%1overlaps horizontally", separator);
- separator = comma;
- }
- if (flags & OVERLAPS_VERTICALLY) {
- errprint("%1overlaps vertically", separator);
- separator = comma;
- }
- if (flags & IS_TRANSPARENT_TO_END_OF_SENTENCE) {
- errprint("%1is transparent to end of sentence", separator);
- separator = comma;
- }
- if (flags & IGNORES_SURROUNDING_HYPHENATION_CODES) {
- errprint("%1ignores surrounding hyphenation codes", separator);
- separator = comma;
- }
- if (flags & PROHIBITS_BREAK_BEFORE) {
- errprint("%1prohibits break before", separator);
- separator = comma;
- }
- if (flags & PROHIBITS_BREAK_AFTER) {
- errprint("%1prohibits break after", separator);
- separator = comma;
- }
- if (flags & IS_INTERWORD_SPACE) {
- errprint("%1is interword space", separator);
- separator = comma;
- }
- errprint(")\n");
- }
- errprint(" asciify code: %1\n", static_cast<int>(asciify_code));
- errprint(" ASCII code: %1\n", static_cast<int>(ascii_code));
- // Also see node.cpp::glyph_node::asciify().
- int mapping = get_unicode_mapping();
- if (mapping >= 0) {
- const size_t buflen = 6; // enough for five hex digits + '\0'
- char hexbuf[buflen];
- (void) memset(hexbuf, '\0', buflen);
- (void) snprintf(hexbuf, buflen, "%.4X", mapping);
- errprint(" Unicode mapping: U+%1\n", hexbuf);
+ if (translation != 0 /* nullptr */)
+ errprint(" is translated\n");
+ else
+ errprint(" is not translated\n");
+ if (mac != 0 /* nullptr */) {
+ errprint(" has a macro: ");
+ mac->json_dump();
+ errprint("\n");
+ }
+ else
+ errprint(" does not have a macro\n");
+ errprint(" special translation: %1\n",
+ static_cast<int>(special_translation));
+ errprint(" hyphenation code: %1\n",
+ static_cast<int>(hyphenation_code));
+ errprint(" flags: %1 (", flags);
+ if (0U == flags)
+ errprint("none)\n");
+ else {
+ char none[] = { '\0' };
+ char comma[] = { ',', ' ', '\0' };
+ char *separator = none;
+ if (flags & ENDS_SENTENCE) {
+ errprint("%1ends sentence", separator);
+ separator = comma;
+ }
+ if (flags & ALLOWS_BREAK_BEFORE) {
+ errprint("%1allows break before", separator);
+ separator = comma;
+ }
+ if (flags & ALLOWS_BREAK_AFTER) {
+ errprint("%1allows break after", separator);
+ separator = comma;
+ }
+ if (flags & OVERLAPS_HORIZONTALLY) {
+ errprint("%1overlaps horizontally", separator);
+ separator = comma;
+ }
+ if (flags & OVERLAPS_VERTICALLY) {
+ errprint("%1overlaps vertically", separator);
+ separator = comma;
+ }
+ if (flags & IS_TRANSPARENT_TO_END_OF_SENTENCE) {
+ errprint("%1is transparent to end of sentence", separator);
+ separator = comma;
+ }
+ if (flags & IGNORES_SURROUNDING_HYPHENATION_CODES) {
+ errprint("%1ignores surrounding hyphenation codes", separator);
+ separator = comma;
+ }
+ if (flags & PROHIBITS_BREAK_BEFORE) {
+ errprint("%1prohibits break before", separator);
+ separator = comma;
+ }
+ if (flags & PROHIBITS_BREAK_AFTER) {
+ errprint("%1prohibits break after", separator);
+ separator = comma;
+ }
+ if (flags & IS_INTERWORD_SPACE) {
+ errprint("%1is interword space", separator);
+ separator = comma;
+ }
+ errprint(")\n");
+ }
+ errprint(" asciify code: %1\n", static_cast<int>(asciify_code));
+ errprint(" ASCII code: %1\n", static_cast<int>(ascii_code));
+ // Also see node.cpp::glyph_node::asciify().
+ int mapping = get_unicode_mapping();
+ if (mapping >= 0) {
+ const size_t buflen = 6; // enough for five hex digits + '\0'
+ char hexbuf[buflen];
+ (void) memset(hexbuf, '\0', buflen);
+ (void) snprintf(hexbuf, buflen, "%.4X", mapping);
+ errprint(" Unicode mapping: U+%1\n", hexbuf);
+ }
+ else
+ errprint(" Unicode mapping: none (%1)\n", mapping);
+ errprint(" is%1 found\n", is_not_found ? " not" : "");
+ errprint(" is%1 transparently translatable\n",
+ is_transparently_translatable ? "" : " not");
+ errprint(" is%1 translatable as input\n",
+ translatable_as_input ? "" : " not");
+ const char *modestr = character_mode_description(mode);
+ if (strcmp(modestr, "") == 0)
+ modestr =" normal";
+ errprint(" mode:%1\n", modestr);
}
- else
- errprint(" Unicode mapping: none (%1)\n", mapping);
- errprint(" is%1 found\n", is_not_found ? " not" : "");
- errprint(" is%1 transparently translatable\n",
- is_transparently_translatable ? "" : " not");
- errprint(" is%1 translatable as input\n",
- translatable_as_input ? "" : " not");
- const char *modestr = character_mode_description(mode);
- if (strcmp(modestr, "") == 0)
- modestr =" normal";
- errprint(" mode:%1\n", modestr);
fflush(stderr);
}
diff --git a/src/roff/troff/request.h b/src/roff/troff/request.h
index ce9967cf2..dd7f92428 100644
--- a/src/roff/troff/request.h
+++ b/src/roff/troff/request.h
@@ -70,6 +70,7 @@ public:
bool is_diversion();
bool is_string();
void clear_string_flag();
+ void dump();
void json_dump();
friend class string_iterator;
friend void chop_macro();
This looks like **way** more churn than it really is because of the changes to
dumping, so that `pchar` on a character class looks different from `pchar` on
a special character.
(I now think I should probably revert my addition of a `.class` register.)
But here's what I'm writing about. That last change, the one marked
"REWRITE", is a doozy. It _had_ been just a comment, which I pasted into one
of the recent highly active Savannah tickets. But I changed it to do this:
$ git show
commit 05eae8986ff3800b2616c62e9b617664914c76b9 (HEAD, refs/bisect/bad)
Author: G. Branden Robinson <[email protected]>
Date: Fri Nov 14 16:56:55 2025 -0600
REWRITE src/roff/troff/input.cpp: Annotate mystery code.
diff --git a/src/roff/troff/charinfo.h b/src/roff/troff/charinfo.h
index 797a7f5f2..0ad68f70c 100644
--- a/src/roff/troff/charinfo.h
+++ b/src/roff/troff/charinfo.h
@@ -285,7 +285,6 @@ inline symbol *charinfo::get_symbol()
inline void charinfo::add_to_class(int c)
{
- using_character_classes = true;
// TODO ranges cumbersome for single characters?
ranges.push_back(std::pair<int, int>(c, c));
}
@@ -293,13 +292,11 @@ inline void charinfo::add_to_class(int c)
inline void charinfo::add_to_class(int lo,
int hi)
{
- using_character_classes = true;
ranges.push_back(std::pair<int, int>(lo, hi));
}
inline void charinfo::add_to_class(charinfo *ci)
{
- using_character_classes = true;
nested_classes.push_back(ci);
}
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index b1b236e92..f79325fb9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -8635,6 +8635,7 @@ void define_class()
skip_line();
return;
}
+ using_character_classes = true;
(void) char_class_dictionary.lookup(nm, ci);
skip_line();
}
@@ -10596,7 +10597,6 @@ void get_flags()
assert(!s.is_null());
ci->get_flags();
}
- using_character_classes = false;
}
// Get the union of all flags affecting this charinfo.
This change shot performance right to hell: normally, using all the cores on
my dev box, I can run the entire test suite in 8 to 9 seconds.
With just this patch above, the tests take more like 38 seconds.
Deri would kill me.
I still think getting rid of this global bit of parser state is the right
idea, because it's bound up with the issue of this report: "`class` request
works or not depending on where in the input it's called", which I could
re-express as "character class interpolations should work the same regardless
of where in the input they occur".
I already have a patch in development to make the formatter complain anytime a
character class gets used where a simple special character is expected.
Observe:
$ git stash show -p 0 | cat
diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index b1e1fb321..e4ce56c4a 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -1676,6 +1676,11 @@ void margin_character()
while (tok.is_space())
tok.next();
charinfo *ci = tok.read_troff_character();
+ if (using_character_classes && ci->is_class()) {
+ error("cannot use %1 as a margin character", tok.description());
+ skip_line();
+ return;
+ }
if (ci != 0 /* nullptr */) {
// Call tok.next() only after making the node so that
// .mc \s+9\(br\s0 works.
@@ -3849,6 +3854,12 @@ static void add_hyphenation_exceptions()
while (i < WORD_MAX && !tok.is_space() && !tok.is_newline()
&& !tok.is_eof()) {
charinfo *ci = tok.read_troff_character(true /* required */);
+ if (using_character_classes && ci->is_class()) {
+ error("cannot use %1 in a hyphenation exception word",
+ tok.description());
+ skip_line();
+ return;
+ }
if (0 /* nullptr */ == ci) {
error("%1 has no associated character information(!)",
tok.description());
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 15d0a8495..166cf1752 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1667,6 +1667,12 @@ node *do_overstrike() // \o
}
else {
charinfo *ci = tok.read_troff_character(true /* required */);
+ if (using_character_classes && ci->is_class()) {
+ error("cannot use %1 in overstrike escape sequence",
+ tok.description());
+ delete osnode;
+ return 0 /* nullptr */;
+ }
if (ci != 0 /* nullptr */) {
node *n = curenv->make_char_node(ci);
if (n != 0 /* nullptr */)
@@ -1720,6 +1726,12 @@ static node *do_bracket() // \b
&& (want_att_compat || input_stack::get_level() == start_level))
break;
charinfo *ci = tok.read_troff_character(true /* required */);
+ if (using_character_classes && ci->is_class()) {
+ error("cannot use %1 in bracket-building escape sequence",
+ tok.description());
+ delete bracketnode;
+ return 0 /* nullptr */;
+ }
if (ci != 0 /* nullptr */) {
node *n = curenv->make_char_node(ci);
if (n != 0 /* nullptr */)
@@ -2616,6 +2628,11 @@ void token::next()
nd = new zero_width_node(nd);
else {
charinfo *ci = read_troff_character(true /* required */);
+ if (using_character_classes && ci->is_class()) {
+ error("cannot use %1 as argument to zero-width escape"
+ " sequence", tok.description());
+ return;
+ }
if (0 /* nullptr */ == ci)
break;
node *gn = curenv->make_char_node(ci);
@@ -2922,13 +2939,15 @@ const char *token::description()
// character or character class names. Do something about that.
// (The truncation is visually indicated by the absence of a
// closing quotation mark.)
- if (using_character_classes
- && tok.read_troff_character()->is_class())
- (void) snprintf(buf, maxstr, "character class %c%s%c", qc, sc,
- qc);
- else
- (void) snprintf(buf, maxstr, "special character %c%s%c", qc, sc,
- qc);
+ static const char special_character[] = "special character";
+ static const char character_class[] = "character class";
+ const char *type = special_character;
+ if (using_character_classes) {
+ charinfo *ci = get_charinfo(nm);
+ if ((ci != 0 /* nullptr */) && ci->is_class())
+ type = character_class;
+ }
+ (void) snprintf(buf, maxstr, "%s %c%s%c", type, qc, sc, qc);
return buf;
}
case TOKEN_SPREAD:
@@ -3557,12 +3576,14 @@ void process_input_stack()
tok.description());
else {
reading_beginning_of_input_line = false;
+ debug("GBR1: calling process() on %1", tok.description());
tok.process();
}
break;
default:
{
reading_beginning_of_input_line = false;
+ debug("GBR2: calling process() on %1", tok.description());
tok.process();
break;
}
@@ -4911,6 +4932,7 @@ void define_character(char_mode mode, const char
*font_name)
skip_line();
return;
}
+ // TODO: If `ci` is already a character class, clobber it.
if (font_name != 0 /* nullptr */) {
string s(font_name);
s += ' ';
@@ -5042,6 +5064,7 @@ static void remove_character()
charinfo *ci = tok.read_troff_character(true /* required */,
true /* suppress
creation */);
+ // TODO: If `ci` is a character class, clobber it.
if (0 /* nullptr */ == ci) {
if (!tok.is_indexed_character())
warning(WARN_CHAR, "%1 is not defined", tok.description());
@@ -5919,6 +5942,11 @@ static bool get_line_arg(units *n, unsigned char si,
charinfo **cip)
if (!(start_token == tok
&& input_stack::get_level() == start_level)) {
*cip = tok.read_troff_character(true /* required */);
+ if (using_character_classes && (*cip)->is_class()) {
+ error("cannot use %1 in line-drawing escape sequence",
+ tok.description());
+ return false;
+ }
tok.next();
}
if (!(start_token == tok
@@ -6229,6 +6257,7 @@ static void do_width() // \w
if (tok == start_token
&& (want_att_compat || input_stack::get_level() == start_level))
break;
+ debug("GBR3: calling process() on %1", tok.description());
tok.process();
}
env.wrap_up_tab();
@@ -6274,8 +6303,10 @@ void read_title_parts(node **part, hunits *part_width)
if ((page_character != 0 /* nullptr */)
&& (tok.read_troff_character() == page_character))
interpolate_register(percent_symbol, 0);
- else
+ else {
+ debug("GBR4: calling process() on %1", tok.description());
tok.process();
+ }
tok.next();
}
curenv->wrap_up_tab();
@@ -6909,6 +6940,7 @@ static bool are_comparands_equal()
&& (want_att_compat
|| input_stack::get_level() == delim_level))
break;
+ debug("GBR5: calling process() on %1", tok.description());
tok.process();
}
curenv = &env2;
@@ -8639,6 +8671,7 @@ void define_class()
skip_line();
return;
}
+ debug("GBR: now using character classes");
using_character_classes = true;
(void) char_class_dictionary.lookup(nm, ci);
skip_line();
@@ -8839,7 +8872,19 @@ void token::process()
curenv->space();
break;
case TOKEN_SPECIAL_CHAR:
- curenv->add_char(get_charinfo(nm));
+ {
+ charinfo *ci = get_charinfo(nm);
+ if (!using_character_classes) {
+ debug("GBR: token:process(): not using character classes");
+ curenv->add_char(get_charinfo(nm));
+ }
+ else if ((ci != 0 /* nullptr */) && !ci->is_class()) {
+ debug("GBR: token:process(): using character classes, but special
character is not a character class");
+ curenv->add_char(get_charinfo(nm));
+ }
+ else
+ error("cannot interpolate %1", description());
+ }
break;
case TOKEN_SPREAD:
curenv->spread();
@@ -10081,6 +10126,7 @@ node *charinfo_to_node_list(charinfo *ci, const
environment *envp)
break;
}
else
+ debug("GBR6: calling process() on %1", tok.description());
tok.process();
}
node *n = curenv->extract_output_line();
This work was only about 50% done.
I think I need to finish up that work and see if the addition of these
"ci->is_class()" guards restores acceptable performance. I suspect what's
happening is that character classes are getting traversed into and searched
when they need not be. The order in which parallelized tests eventually
finish supports this hypothesis.
PASS: src/utils/grog/tests/smoke-test.sh
PASS: src/roff/groff/tests/do-not-loop-infinitely-when-breaking-cjk.sh
PASS: src/roff/groff/tests/check-delimiter-validity.sh
PASS: contrib/hdtbl/examples/test-hdtbl.sh
PASS: tmac/tests/localization-works.sh
These guys reliably trail the pack as of this commit.
"check-delimiter-validity.sh" is expected to be slow, because it launches the
formatter dozens or hundreds of times in an even now not quite exhaustive
exploration of the available escape sequence delimiter space. _grog_'s smoke
test has always been slow because it's a lot of Perl wrapped around running
_groff_ several times.
But the other 3 all hit the CJK macro files. Two for, I suspect, obvious
reasons, and "test-hdtbl.sh" because it prints (empty) code charts for the
abstract CJK faces contributed by TANAKA Takuji. So all 3 of them definitely
exercise code paths involving the character class dictionary.
So, bottom line, "using_character_classes" is working _both_ as a performance
optimizer **and** as a parser-state-tweaking bit.
I don't know if the former was intended, but I now think I can scotch the
variable and simplify these pending
if (using_character_classes && ci->is_class()) {
tests to just
if (ci->is_class()) {
...and if that's _still_ crappily slow, I can make `charinfo::is_class` either
ask the `char_class_dictionary` object if it's populated, or "memoize" (I've
always hated that word, though not the technique) the fact thereof.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?67571>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
