https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124015
Bug ID: 124015
Summary: regex_traits::lookup_classname returns
ctype_base::alpha for "graph" with icase=true
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: redi at gcc dot gnu.org
Blocks: 102445
Target Milestone: ---
This assertion fails on Cygwin and other newlib targets:
#include <assert.h>
#include <regex>
int main()
{
const char* s1 = "graph";
const char* s2 = "lower";
std::regex_traits<char> t;
auto g = t.lookup_classname(s1, s1+5, true);
auto l = t.lookup_classname(s2, s2+5, true);
assert(g != l);
}
By inspection, I think it will also fail for qnx, vxworks, and picolibc.
The problem is here:
for (const auto& __it : __classnames)
if (__s == __it.first)
{
if (__icase
&& ((__it.second
& (ctype_base::lower | ctype_base::upper)) != 0))
return ctype_base::alpha;
return __it.second;
}
This logic means that any ctype_base mask that has any bits of lower|upper set
will be replaced by alpha.
But that's wrong for targets that do not have a distinct bit for alpha and
define it as lower|upper, e.g. from config/os/newlib/ctype_base.h
static const mask upper = mask (_U);
static const mask lower = mask (_L);
static const mask alpha = mask (_U | _L);
static const mask digit = mask (_N);
static const mask xdigit = mask (_X | _N);
static const mask space = mask (_S);
static const mask print = mask (_P | _U | _L | _N | _B);
static const mask graph = mask (_P | _U | _L | _N);
static const mask cntrl = mask (_C);
static const mask punct = mask (_P);
static const mask alnum = mask (_U | _L | _N);
The condition ((x & (lower|upper)) != 0) is true for all x in upper, lower,
alpha, print, graph, and alnum. But the case-insensitive equivalent of graph is
not alpha. The logic is only correct for lower and upper (and is harmless for
alpha).
I think the fix is:
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -311,7 +311,7 @@ namespace __detail
if (__icase
&& ((__it.second
& (ctype_base::lower | ctype_base::upper)) != 0))
- return ctype_base::alpha;
+ return __it.second | ctype_base::alpha;
return __it.second;
}
return 0;
i.e. if lower or upper is set, then also set alpha.
Libc++ uses (x | ctype_base::alpha | ctype_base::lower | ctype_base::upper) for
this case, which seems entirely redundant.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102445
[Bug 102445] [meta-bug] std::regex issues