https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124015

            Bug ID: 124015
           Summary: regex_traits::lookup_classname returns
                    ctype_base::alpha for "graph" with icase=true
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
            Blocks: 102445
  Target Milestone: ---

This assertion fails on Cygwin and other newlib targets:

#include <assert.h>
#include <regex>
int main()
{
  const char* s1 = "graph";
  const char* s2 = "lower";
  std::regex_traits<char> t;
  auto g = t.lookup_classname(s1, s1+5, true);
  auto l = t.lookup_classname(s2, s2+5, true);
  assert(g != l);
}

By inspection, I think it will also fail for qnx, vxworks, and picolibc.

The problem is here:

      for (const auto& __it : __classnames)
        if (__s == __it.first)
          {
            if (__icase
                && ((__it.second
                     & (ctype_base::lower | ctype_base::upper)) != 0))
              return ctype_base::alpha;
            return __it.second;
          }

This logic means that any ctype_base mask that has any bits of lower|upper set
will be replaced by alpha.

But that's wrong for targets that do not have a distinct bit for alpha and
define it as lower|upper, e.g. from config/os/newlib/ctype_base.h

    static const mask upper     = mask (_U);
    static const mask lower     = mask (_L);
    static const mask alpha     = mask (_U | _L);
    static const mask digit     = mask (_N);
    static const mask xdigit    = mask (_X | _N);
    static const mask space     = mask (_S);
    static const mask print     = mask (_P | _U | _L | _N | _B);
    static const mask graph     = mask (_P | _U | _L | _N);
    static const mask cntrl     = mask (_C);
    static const mask punct     = mask (_P);
    static const mask alnum     = mask (_U | _L | _N);

The condition ((x & (lower|upper)) != 0) is true for all x in upper, lower,
alpha, print, graph, and alnum. But the case-insensitive equivalent of graph is
not alpha. The logic is only correct for lower and upper (and is harmless for
alpha).

I think the fix is:

--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -311,7 +311,7 @@ namespace __detail
            if (__icase
                && ((__it.second
                     & (ctype_base::lower | ctype_base::upper)) != 0))
-             return ctype_base::alpha;
+             return __it.second | ctype_base::alpha;
            return __it.second;
          }
       return 0;

i.e. if lower or upper is set, then also set alpha.

Libc++ uses (x | ctype_base::alpha | ctype_base::lower | ctype_base::upper) for
this case, which seems entirely redundant.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102445
[Bug 102445] [meta-bug] std::regex issues

Reply via email to