Re: [PATCH 3/3] libstdc++: Optimize std::regex_traits lookup functions

Jonathan Wakely Tue, 10 Feb 2026 14:00:45 -0800

On Tue, 10 Feb 2026 at 14:07, Tomasz Kaminski <[email protected]> wrote:
> On Mon, Feb 9, 2026 at 9:43 PM Jonathan Wakely <[email protected]> wrote:
>> @@ -278,43 +302,97 @@ namespace __detail
>>      regex_traits<_Ch_type>::
>>      lookup_classname(_Fwd_iter __first, _Fwd_iter __last, bool __icase) 
>> const
>>      {
>> +      if constexpr (__is_any_random_access_iter<_Fwd_iter>::value)
>> +       if ((__last - __first) > 6) [[__unlikely__]]
>> +         return {}; // "xdigit" is the longest classname
>> +
>>        typedef std::ctype<char_type> __ctype_type;
>>        const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
>>
>> -      // Mappings from class name to class mask.
>> -      static const pair<const char*, char_class_type> __classnames[] =
>> -      {
>> -       {"d", ctype_base::digit},
>> -       {"w", {ctype_base::alnum, _RegexMask::_S_under}},
>> -       {"s", ctype_base::space},
>> -       {"alnum", ctype_base::alnum},
>> -       {"alpha", ctype_base::alpha},
>> -       {"blank", ctype_base::blank},
>> -       {"cntrl", ctype_base::cntrl},
>> -       {"digit", ctype_base::digit},
>> -       {"graph", ctype_base::graph},
>> -       {"lower", ctype_base::lower},
>> -       {"print", ctype_base::print},
>> -       {"punct", ctype_base::punct},
>> -       {"space", ctype_base::space},
>> -       {"upper", ctype_base::upper},
>> -       {"xdigit", ctype_base::xdigit},
>> +      auto __read_ch = [&]() -> char {
>> +       if (__first == __last)
>> +         return '\0';
>> +       char __c = __fctyp.narrow(__fctyp.tolower(*__first), 0);
>> +       ++__first;
>> +       return __c;
>>        };
>>
>> -      string __s;
>> -      for (; __first != __last; ++__first)
>> -       __s += __fctyp.narrow(__fctyp.tolower(*__first), 0);
>> +      auto __match = [&](const char* __s) -> bool {
>> +       do
>> +         if (__read_ch() != *__s)
>> +           return false;
>> +       while (*++__s);
>> +       return __first == __last;
>> +      };
>>
>> -      for (const auto& __it : __classnames)
>> -       if (__s == __it.first)
>> +      switch(__read_ch())
>
> That's a really cool idea, to switch on one character.


I first saw this technique about 20 years ago in some autogenerated
code for matching command names in a function dispatcher, where it
provided no performance benefits over just iterating over an array of
names and using strcmp on each one.

But I think it's worth the extra complexity here (and improves
performance) because:
1) We have a fairly small set of strings to match which are defined in
the standard and don't need to change often, so there's not much
maintenance cost.
2) The input string is provided as an iterator pair [begin,end), not
as a char*, so the original code was constructing a std::string to
hold the string before matching it.
3) We aren't just matching the input string to the set of names, we
also need to call ctype.narrow(ctype.tolower(*begin)) on each input
character, so avoiding doing that up-front on the entire string means
we only do that work as needed.
4) ... it's 20 years later, maybe optimizers have changed :-)

I'm testing the PATCH v2 series now ...

Re: [PATCH 3/3] libstdc++: Optimize std::regex_traits lookup functions

Reply via email to