On Tue, 10 Feb 2026 at 14:07, Tomasz Kaminski <[email protected]> wrote: > On Mon, Feb 9, 2026 at 9:43 PM Jonathan Wakely <[email protected]> wrote: >> @@ -278,43 +302,97 @@ namespace __detail >> regex_traits<_Ch_type>:: >> lookup_classname(_Fwd_iter __first, _Fwd_iter __last, bool __icase) >> const >> { >> + if constexpr (__is_any_random_access_iter<_Fwd_iter>::value) >> + if ((__last - __first) > 6) [[__unlikely__]] >> + return {}; // "xdigit" is the longest classname >> + >> typedef std::ctype<char_type> __ctype_type; >> const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale)); >> >> - // Mappings from class name to class mask. >> - static const pair<const char*, char_class_type> __classnames[] = >> - { >> - {"d", ctype_base::digit}, >> - {"w", {ctype_base::alnum, _RegexMask::_S_under}}, >> - {"s", ctype_base::space}, >> - {"alnum", ctype_base::alnum}, >> - {"alpha", ctype_base::alpha}, >> - {"blank", ctype_base::blank}, >> - {"cntrl", ctype_base::cntrl}, >> - {"digit", ctype_base::digit}, >> - {"graph", ctype_base::graph}, >> - {"lower", ctype_base::lower}, >> - {"print", ctype_base::print}, >> - {"punct", ctype_base::punct}, >> - {"space", ctype_base::space}, >> - {"upper", ctype_base::upper}, >> - {"xdigit", ctype_base::xdigit}, >> + auto __read_ch = [&]() -> char { >> + if (__first == __last) >> + return '\0'; >> + char __c = __fctyp.narrow(__fctyp.tolower(*__first), 0); >> + ++__first; >> + return __c; >> }; >> >> - string __s; >> - for (; __first != __last; ++__first) >> - __s += __fctyp.narrow(__fctyp.tolower(*__first), 0); >> + auto __match = [&](const char* __s) -> bool { >> + do >> + if (__read_ch() != *__s) >> + return false; >> + while (*++__s); >> + return __first == __last; >> + }; >> >> - for (const auto& __it : __classnames) >> - if (__s == __it.first) >> + switch(__read_ch()) > > That's a really cool idea, to switch on one character.
I first saw this technique about 20 years ago in some autogenerated code for matching command names in a function dispatcher, where it provided no performance benefits over just iterating over an array of names and using strcmp on each one. But I think it's worth the extra complexity here (and improves performance) because: 1) We have a fairly small set of strings to match which are defined in the standard and don't need to change often, so there's not much maintenance cost. 2) The input string is provided as an iterator pair [begin,end), not as a char*, so the original code was constructing a std::string to hold the string before matching it. 3) We aren't just matching the input string to the set of names, we also need to call ctype.narrow(ctype.tolower(*begin)) on each input character, so avoiding doing that up-front on the entire string means we only do that work as needed. 4) ... it's 20 years later, maybe optimizers have changed :-) I'm testing the PATCH v2 series now ...
