[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

redi at gcc dot gnu.org Fri, 18 May 2018 05:06:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824


Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |timshen at gcc dot gnu.org

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Wanying Luo from comment #0)
> When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under
> the UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type
> size_t which creates a very large number. This causes __ret.append(__c,
> __res) to crash. I think it would be nice if the code checks errno and
> issues a better error message than the one above.

N.B. it doesn't just crash, it throws an exception because it can't append
4294967295 bytes to a std::string. Any fix to check errno in
collate<char>::do_transform is still going to involve throwing an exception,
just a slightly different one.

The real problem is that std::regex wants to build a cache of every value from
CHAR_MIN to CHAR_MAX, to decide if it matches the bracket expression "[0-9]".
If calling strxfrm on any 8-bit char value produces an error then we're going
to get an exception. I think something in the regex compiler (maybe in
transform_primary) needs to handle those exceptions (and either decide the
characters that produce errors do not match, or maybe disable the cache?)

Tim, I'll take care of checking errno in collate<>::_M_transform but could you
advise what to do about the regex compiler? Maybe:

--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -257,7 +257,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
          const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
          std::vector<char_type> __s(__first, __last);
          __fctyp.tolower(__s.data(), __s.data() + __s.size());
-         return this->transform(__s.data(), __s.data() + __s.size());
+         __try {
+           return this->transform(__s.data(), __s.data() + __s.size());
+         } catch(const std::runtime_error&) {
+           return string_type();
+         }
        }

       /**

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris SPARC when parsing a simple character class

Reply via email to