Re: [PATCH] STDCXX-1073
On 10/25/12 11:31, Martin Sebor wrote: There are comments suggesting that calling do_transform() on the whole string may be suboptimal. Intuitively it makes sense that calling wcscoll() (in a loop, on the NUL-terminated substrings, if necessary) should be faster than simply calling do_transform() followed by wstring::compare(), but it would make sense to confirm the hypothesis before implementing the optimization. Alright, same files, two more patches. 2012-10-28 Liviu Nicoara lnico...@apache.org Fixes to collate facet and test enhancements: * src/collate.cpp (__rw_strnxfrm): corrected processing of embedded NULs. (__rw_wcsnxfrm) same (duplicated code). (collate_bynamewchar_t::do_compare): fixed string comparison return values, re-implemented the wcscoll-based comparison. * tests/localization/22.locale.collate.cpp: implemented a simpler collation test for strings with embedded NULs. All existing and new collate tests pass, but I think some more test enhancements are required. Liviu Index: tests/localization/22.locale.collate.cpp === --- tests/localization/22.locale.collate.cpp(revision 1402988) +++ tests/localization/22.locale.collate.cpp(working copy) @@ -116,7 +116,7 @@ const char* widen (char* dst, const char return dst; } -#ifndef _RWSTD_NO_WCHAR_T +#if !defined (_RWSTD_NO_WCHAR_T) int c_strcoll (const wchar_t* s1, const wchar_t* s2) { @@ -1029,65 +1029,119 @@ check_hash_eff (const char* charTname) template class charT void -check_NUL_locale (const char* charTname, const char* locname) +check_NUL_collate (const char* charTname, const char* locname, + const charT* s1, size_t s1_len, + const charT* s2, size_t s2_len) { std::locale loc (locname); -charT s [STR_SIZE]; -gen_str (s, STR_SIZE); +typedef typename std::collatecharT Collate; +typedef typename Collate::string_type String; -charT buf [2][STR_SIZE]; +const Collate col = std::use_facetCollate (loc); -std::memcpy (buf [0], s, sizeof s); -std::memcpy (buf [1], s, sizeof s); +const String x1 = col.transform (s1, s1 + s1_len); +const String x2 = col.transform (s2, s2 + s2_len); -// -// Verify that first buffer compares more: -// |0| = buf [0] -// |0| = buf [1] -// -buf [0][4] = charT (); -buf [1][3] = charT (); +const int colcmp = col.compare (s1, s1 + s1_len, s2, s2 + s2_len); -typedef std::collatecharT Collate; +int lexcmp = x1.compare (x2); +lexcmp = lexcmp -1 ? -1 : 1 lexcmp ? 1 : lexcmp; + +rw_assert (colcmp == lexcmp, __FILE__, __LINE__, + collate%s::compare (%{*.*Ac}, %{*.*Ac}) = %d, + lexicographical comparison of transformed strings = %d, + mismatch in locale (\%s\), charTname, + sizeof (charT), s1_len, s1, + sizeof (charT), s2_len, s2, + colcmp, lexcmp, locname); + +const bool eq = +std::string (s1, s1 + s1_len) == +std::string (s2, s2 + s2_len); + +rw_assert (bool (colcmp) != eq, __FILE__, __LINE__, + collate%s::compare (%{*.*Ac}, %{*.*Ac}) = %d, + lexicographical compare = %s, mismatch in locale (\%s\), + charTname, + sizeof (charT), s1_len, s1, + sizeof (charT), s2_len, s2, colcmp, + (eq ? true : false), locname); +} -const Collate col = std::use_facetCollate (loc); +static void +check_NUL_collate (const char* charTname, const char* locname, char) +{ +#define T(s, t) \ +check_NUL_collate (charTname, locname, \ + s, sizeof s / sizeof *s - 1, \ + t, sizeof t / sizeof *t - 1) + +T (, ); +T (, \0); +T (, \0\0); +T (\0, ); +T (\0, \0); +T (\0, \0\0); +T (a, \0); +T (a, \0a); +T (a, a\0); +T (a, a\0\0); +T (a\0, a); +T (a\0, a\0); +T (a\0, a\0\0); +T (\0a, ); +T (\0a, \0); +T (\0a, \0a); +T (\0a, \0a\0); +T (a\0\0b, ); +T (a\0\0b, a); +T (a\0\0b, ab); +T (a\0\0b, a\0); +T (a\0\0b, a\0\0); +T (a\0\0b, a\0b); +T (a\0\0b, a\0\0b); +} -int cmp = col.compare ( -buf [0], buf [0] + sizeof buf [0] / sizeof *buf [0], -buf [1], buf [1] + sizeof buf [1] / sizeof *buf [1]); - -rw_assert (cmp 0, __FILE__, __LINE__, - collate%s::compare (%{*.*Ac}, %{*.*Ac}) - 0, failed in locale (\%s\), charTname, - sizeof (charT), sizeof buf [0] / sizeof *buf [0], buf [0], - sizeof (charT), sizeof buf [1] / sizeof *buf [1], buf [1], - locname); - -std::memcpy (buf [0], s, sizeof s); -std::memcpy (buf [1], s, sizeof s); - -// -// Verify that first compare less: -//
Re: [PATCH] STDCXX-1073
On 10/25/2012 06:22 AM, Liviu Nicoara wrote: On 10/24/12 11:11, Martin Sebor wrote: On 10/24/2012 07:12 AM, Liviu Nicoara wrote: [...] I modified the test according to the suggestions. The test fails all corresponding wide-char cases and I am investigating other potential defects as well. For example, I do not think that employing strcoll and wcscoll in compare is correct as they stop at the first NUL, although strings may contain characters after the NUL that alter the result of the comparison. I would expect the wchar_t specialization to be analogous to the narrow one. In fact (without looking at the code), I would even think both could be implemented in terms of the same function template specialized on the character type and on the libc string function. (Although I'm not necessarily suggesting this as the solution to this issue.) They are not similar, AFAICT. In revision 367462 you implemented the wide byname do_compare in terms of wcscoll, if available, when using libc. That implementation does not take into account embedded NULs. This is in contrast with the narrow byname which simply transforms the strings and compares them. IIUC, rev 367462 actually implements the function in terms of do_transform() when wcscoll() isn't available as a workaround. Ironically, the workaround is actually better than the default implementation. Testing shows that an implementation of wide byname do_compare identical with the narrow version passes all tests, as expected. It's easy to me to just remove the wcscoll-based implementation. Also, error reporting from wcscoll seems difficult to use when libc does not have thread-local errno's, and right now we don't check for errors. OTOH, I expect wcscoll to be faster than the (simpler) transformation followed by comparison. I incline towards the simpler approach. Thoughts? There are comments suggesting that calling do_transform() on the whole string may be suboptimal. Intuitively it makes sense that calling wcscoll() (in a loop, on the NUL-terminated substrings, if necessary) should be faster than simply calling do_transform() followed by wstring::compare(), but it would make sense to confirm the hypothesis before implementing the optimization. Martin Liviu
Re: [PATCH] STDCXX-1073
On 10/24/2012 07:12 AM, Liviu Nicoara wrote: A few observations after spending a few hours working on the improved test. On 10/21/12 19:08, Martin Sebor wrote: ... There's no requirement that embedded NULs must be preserved (that's just how it happens to be implemented). I find it best to avoid relying on the knowledge of implementation details when exercising the library so that tests don't start failing after a conforming optimization or some such tweak is added to the code. I agree with the implementation details part. However, there is only one way NULs get processed and that is by keeping them in the string, verbatim. In this respect the previous test incarnation was a stronger test. Consider an encoding where UCHAR_MAX doesn't correspond to a valid character, and a hypothetical implementation that, before transforming the string, increments the value of each character by 1. That will lose all the initial embedded NULs. I modified the test according to the suggestions. The test fails all corresponding wide-char cases and I am investigating other potential defects as well. For example, I do not think that employing strcoll and wcscoll in compare is correct as they stop at the first NUL, although strings may contain characters after the NUL that alter the result of the comparison. I would expect the wchar_t specialization to be analogous to the narrow one. In fact (without looking at the code), I would even think both could be implemented in terms of the same function template specialized on the character type and on the libc string function. (Although I'm not necessarily suggesting this as the solution to this issue.) Attached is a test case that reproduces the bug without relying on implementation details. The test passes with your patch, confirming that it's good. This test still makes the assumption that strings that lexicographically distinct strings compare unequal in the en_US locale, but I think that's a safe assumption regardless of the encoding (though only for the hardcoded strings). All narrow test cases pass fine, but I gotta look into the wide-char failures. Good to hear! I wish I had more time to look into it with you. Martin Liviu
Re: [PATCH] STDCXX-1073
The library patch seems correct but the test case in the issue and the new test aren't strictly conforming. The only requirement on collate::tranform() is that it return... A basic_stringcharT value that, compared lexicographically with the result of calling transform() on another string, yields the same result as calling do_compare() on the same two strings. There's no requirement that embedded NULs must be preserved (that's just how it happens to be implemented). I find it best to avoid relying on the knowledge of implementation details when exercising the library so that tests don't start failing after a conforming optimization or some such tweak is added to the code. Attached is a test case that reproduces the bug without relying on implementation details. The test passes with your patch, confirming that it's good. This test still makes the assumption that strings that lexicographically distinct strings compare unequal in the en_US locale, but I think that's a safe assumption regardless of the encoding (though only for the hardcoded strings). Martin On 10/18/2012 04:28 PM, Liviu Nicoara wrote: On 10/18/12 11:02, Martin Sebor wrote: On 10/16/2012 10:38 AM, Liviu Nicoara wrote: I have enhanced the collation test, 22.locale.collate.cpp with a bunch of cases that deal with embedded strings, inside the input strings as well as at the edges. More defects became apparent, and they have been fixed. I have re-worked the src/collate.cpp patch and I am attaching it here. All collation tests (old and new) pass. If there are no objections, I will check it in later in the week. There are tabs in the patch (we need spaces). They also make the patch harder to review than it otherwise would be. I would also suggest keeping the MSVC 8.0 block and cleaning things up separately. That will also make the patch easier to review. (I'm so swamped that this matters to me.) I looked at both (latest) patches and they do not have tabs in them. They were sent in my last post on the 16th, the one you replied to. The diff without -w is quite a bit verbose because I moved a whole block in both collate.cpp functions, inside an if, so there are whitespace differences. svn diff -w gave me a quite decent view of the changes without having to looks at the actual patched source file. I am re-attaching the patches, this time created without -w so they would be a bit verbose. No tabs this time either. I haven't looked at the test patch yet. If you could clean up the tab issue and reduce the amount of unnecessary whitespace changes I promise to review it this weekend. The test enhancements are pretty plain, just a bunch of added stuff. Thanks, Liviu #include cassert #include cstdio #include locale int main () { const std::locale loc (en_US); typedef std::collatechar C; const C col = std::use_facetC(loc); static const struct { const char *a; unsigned n1; const char *b; unsigned n2; } tests[] = { #define T(s, t) { s, sizeof s - 1, t, sizeof t - 1 } T (, ), T (, \0), T (, \0\0), T (\0, ), T (\0, \0), T (\0, \0\0), T (a, \0), T (a, \0a), T (a, a\0), T (a, a\0\0), T (a\0, a), T (a\0, a\0), T (a\0, a\0\0), T (\0a, ), T (\0a, \0), T (\0a, \0a), T (\0a, \0a\0), T (a\0\0b, ), T (a\0\0b, a), T (a\0\0b, ab), T (a\0\0b, a\0), T (a\0\0b, a\0\0), T (a\0\0b, a\0b), T (a\0\0b, a\0\0b), }; for (unsigned i = 0; i != sizeof tests / sizeof *tests; ++i) { const char* const a = tests [i].a; unsigned n1 = tests [i].n1; const char* const b = tests [i].b; unsigned n2 = tests [i].n2; const std::string x1 = col.transform (a, a + n1); const std::string x2 = col.transform (b, b + n2); const int colcmp = col.compare (a, a + n1, b, b + n2); int lexcmp = x1.compare (x2); lexcmp = lexcmp -1 ? -1 : 1 lexcmp ? 1 : lexcmp; std::printf (%u: %i == %i\n, i, lexcmp, colcmp); assert (colcmp == lexcmp); if (colcmp) assert (std::string (a, a + n1) != std::string (b, b + n2)); else assert (std::string (a, a + n1) == std::string (b, b + n2)); } }
Re: [PATCH] STDCXX-1073
On 10/21/12 19:08, Martin Sebor wrote: There's no requirement that embedded NULs must be preserved (that's just how it happens to be implemented). I find it best to avoid relying on the knowledge of implementation details when exercising the library so that tests don't start failing after a conforming optimization or some such tweak is added to the code. That's right. I'll improve it. Thanks for the review. Liviu
Re: [PATCH] STDCXX-1073
On 10/16/2012 10:38 AM, Liviu Nicoara wrote: I have enhanced the collation test, 22.locale.collate.cpp with a bunch of cases that deal with embedded strings, inside the input strings as well as at the edges. More defects became apparent, and they have been fixed. I have re-worked the src/collate.cpp patch and I am attaching it here. All collation tests (old and new) pass. If there are no objections, I will check it in later in the week. There are tabs in the patch (we need spaces). They also make the patch harder to review than it otherwise would be. I would also suggest keeping the MSVC 8.0 block and cleaning things up separately. That will also make the patch easier to review. (I'm so swamped that this matters to me.) I haven't looked at the test patch yet. If you could clean up the tab issue and reduce the amount of unnecessary whitespace changes I promise to review it this weekend. Martin Thanks, Liviu On 10/13/12 11:16, Liviu Nicoara wrote: The initial patch does not pass the following test case. Have re-worked the patch and attached it to the incident, and I am also attaching it here. It passes all collate tests.
Re: [PATCH] STDCXX-1073
On 10/18/12 11:02, Martin Sebor wrote: On 10/16/2012 10:38 AM, Liviu Nicoara wrote: I have enhanced the collation test, 22.locale.collate.cpp with a bunch of cases that deal with embedded strings, inside the input strings as well as at the edges. More defects became apparent, and they have been fixed. I have re-worked the src/collate.cpp patch and I am attaching it here. All collation tests (old and new) pass. If there are no objections, I will check it in later in the week. There are tabs in the patch (we need spaces). They also make the patch harder to review than it otherwise would be. I would also suggest keeping the MSVC 8.0 block and cleaning things up separately. That will also make the patch easier to review. (I'm so swamped that this matters to me.) I looked at both (latest) patches and they do not have tabs in them. They were sent in my last post on the 16th, the one you replied to. The diff without -w is quite a bit verbose because I moved a whole block in both collate.cpp functions, inside an if, so there are whitespace differences. svn diff -w gave me a quite decent view of the changes without having to looks at the actual patched source file. I am re-attaching the patches, this time created without -w so they would be a bit verbose. No tabs this time either. I haven't looked at the test patch yet. If you could clean up the tab issue and reduce the amount of unnecessary whitespace changes I promise to review it this weekend. The test enhancements are pretty plain, just a bunch of added stuff. Thanks, Liviu Index: tests/localization/22.locale.collate.cpp === --- tests/localization/22.locale.collate.cpp(revision 1399886) +++ tests/localization/22.locale.collate.cpp(working copy) @@ -1029,7 +1029,105 @@ check_hash_eff (const char* charTname) template class charT void -check_NUL_locale (const char* charTname, const char* locname) +check_NUL_transform (const char* charTname, const char* locname, + const charT* src, size_t src_len, + const char* pat, size_t pat_len) +{ +std::locale loc (locname); + +typedef typename std::collatecharT Collate; +typedef typename Collate::string_type String; + +const Collate col = std::use_facetCollate (loc); + +String s = col.transform (src, src + src_len); + +for (size_t i = 0, j = 0; i pat_len; ++i) { + +if (j = s.size ()) { +rw_assert (false, __FILE__, __LINE__, + collate%s::transform (%{*.*Ac}) - %{*.*Ac} + , incomplete transformation (locale \%s\), + charTname, sizeof (charT), src_len, src, + sizeof (charT), s.size (), s.c_str (), locname); +break; +} + +switch (pat [i]) { +case 0: +rw_assert (j s.size () 0 == s [j], __FILE__, __LINE__, + collate%s::transform (\%{*.*Ac}\) - \%{*.*Ac}\ + , expected \\0, got %d in position %lu (locale \%s\), + charTname, sizeof (charT), src_len, src, sizeof (charT), + s.size (), s.c_str (), s [j], j, locname); +++j; +break; + +case '*': +rw_assert (j s.size () s [j], __FILE__, __LINE__, + collate%s::transform (\%{*.*Ac}\) - \%{*.*Ac}\ + , got NUL in position %lu (locale \%s\), charTname, + sizeof (charT), src_len, src, sizeof (charT), s.size (), + s.c_str (), j, locname); +for (; j s.size () s [j]; ++j) ; +break; + +default: +break; +} +} +} + +static void +check_NUL_transform_narrow (const char* charTname, const char* locname) +{ +#define T(src, pat) \ +check_NUL_transform (charTname, locname,\ + src, sizeof src / sizeof *src - 1, \ + pat, sizeof pat / sizeof *pat - 1) + +T (\0,\0 ); +T (\0\0, \0\0 ); +T (\0\0\0,\0\0\0 ); +T (a\0, *\0 ); +T (\0a\0, \0*\0); +T (\0a\0, \0*\0); +T (a\0\0, *\0\0); +T (a\0\0\0, *\0\0\0 ); +T (a\0b, *\0* ); +T (a\0b\0,*\0*\0 ); +T (a\0b\0\0, *\0*\0\0 ); +} + +static void +check_NUL_transform_wide (const char* charTname, const char* locname) +{ +T (L\0, \0 ); +T (L\0\0, \0\0 ); +T (L\0\0\0, \0\0\0 ); +T (La\0, *\0 ); +T (L\0a\0,\0*\0); +T (L\0a\0,\0*\0); +T (La\0\0,*\0\0); +T (La\0\0\0, *\0\0\0 ); +T (La\0b, *\0* ); +T (La\0b\0, *\0*\0 ); +T (La\0b\0\0, *\0*\0\0 ); + +#undef T +} + +template class charT +void +check_NUL_transform (const char*
Re: [PATCH] STDCXX-1073
I have enhanced the collation test, 22.locale.collate.cpp with a bunch of cases that deal with embedded strings, inside the input strings as well as at the edges. More defects became apparent, and they have been fixed. I have re-worked the src/collate.cpp patch and I am attaching it here. All collation tests (old and new) pass. If there are no objections, I will check it in later in the week. Thanks, Liviu On 10/13/12 11:16, Liviu Nicoara wrote: The initial patch does not pass the following test case. Have re-worked the patch and attached it to the incident, and I am also attaching it here. It passes all collate tests. Index: tests/localization/22.locale.collate.cpp === --- tests/localization/22.locale.collate.cpp(revision 1397847) +++ tests/localization/22.locale.collate.cpp(working copy) @@ -1029,7 +1029,105 @@ check_hash_eff (const char* charTname) template class charT void -check_NUL_locale (const char* charTname, const char* locname) +check_NUL_transform (const char* charTname, const char* locname, + const charT* src, size_t src_len, + const char* pat, size_t pat_len) +{ +std::locale loc (locname); + +typedef typename std::collatecharT Collate; +typedef typename Collate::string_type String; + +const Collate col = std::use_facetCollate (loc); + +String s = col.transform (src, src + src_len); + +for (size_t i = 0, j = 0; i pat_len; ++i) { + +if (j = s.size ()) { +rw_assert (false, __FILE__, __LINE__, + collate%s::transform (%{*.*Ac}) - %{*.*Ac} + , incomplete transformation (locale \%s\), + charTname, sizeof (charT), src_len, src, + sizeof (charT), s.size (), s.c_str (), locname); +break; +} + +switch (pat [i]) { +case 0: +rw_assert (j s.size () 0 == s [j], __FILE__, __LINE__, + collate%s::transform (\%{*.*Ac}\) - \%{*.*Ac}\ + , expected \\0, got %d in position %lu (locale \%s\), + charTname, sizeof (charT), src_len, src, sizeof (charT), + s.size (), s.c_str (), s [j], j, locname); +++j; +break; + +case '*': +rw_assert (j s.size () s [j], __FILE__, __LINE__, + collate%s::transform (\%{*.*Ac}\) - \%{*.*Ac}\ + , got NUL in position %lu (locale \%s\), charTname, + sizeof (charT), src_len, src, sizeof (charT), s.size (), + s.c_str (), j, locname); +for (; j s.size () s [j]; ++j) ; +break; + +default: +break; +} +} +} + +static void +check_NUL_transform_narrow (const char* charTname, const char* locname) +{ +#define T(src, pat) \ +check_NUL_transform (charTname, locname,\ + src, sizeof src / sizeof *src - 1, \ + pat, sizeof pat / sizeof *pat - 1) + +T (\0,\0 ); +T (\0\0, \0\0 ); +T (\0\0\0,\0\0\0 ); +T (a\0, *\0 ); +T (\0a\0, \0*\0); +T (\0a\0, \0*\0); +T (a\0\0, *\0\0); +T (a\0\0\0, *\0\0\0 ); +T (a\0b, *\0* ); +T (a\0b\0,*\0*\0 ); +T (a\0b\0\0, *\0*\0\0 ); +} + +static void +check_NUL_transform_wide (const char* charTname, const char* locname) +{ +T (L\0, \0 ); +T (L\0\0, \0\0 ); +T (L\0\0\0, \0\0\0 ); +T (La\0, *\0 ); +T (L\0a\0,\0*\0); +T (L\0a\0,\0*\0); +T (La\0\0,*\0\0); +T (La\0\0\0, *\0\0\0 ); +T (La\0b, *\0* ); +T (La\0b\0, *\0*\0 ); +T (La\0b\0\0, *\0*\0\0 ); + +#undef T +} + +template class charT +void +check_NUL_transform (const char* charTname, const char* locname) +{ +check_NUL_transform_narrow (charTname, locname); +check_NUL_transform_wide (charTname, locname); +} + +template class charT +void +check_NUL_collate (const char* charTname, const char* locname) { std::locale loc (locname); @@ -1090,6 +1188,14 @@ check_NUL_locale (const char* charTname, template class charT void +check_NUL (const char* charTname, const char* locname) +{ +check_NUL_transform charT (charTname, locname); +check_NUL_collate charT(charTname, locname); +} + +template class charT +void check_NUL (const char* charTname) { // Verify that the collate facet correctly handles character @@ -1103,7 +1209,7 @@ check_NUL (const char* charTname) for (const char* locname = rw_locales (LC_COLLATE); *locname; locname += std::strlen (locname) + 1, ++i) { try { -check_NUL_localecharT (charTname, locname); +
Re: [PATCH] STDCXX-1073
The initial patch does not pass the following test case. Have re-worked the patch and attached it to the incident, and I am also attaching it here. It passes all collate tests. Here is the second test case: $ cat ../../tests/localization/t.cpp; nice make t.cpp ./t af_ZA.utf8; echo $? #include iostream #include locale #include string int main (int argc, char** argv) { std::locale loc (argv [1]); const std::collate char fac = std::use_facet std::collate char (loc); char const buf [] = a\0\0b; std::string s = fac.transform (buf, buf + sizeof buf - 1); size_t i = 0; for (; i s.size () 0 == s [i]; ++i) ; return !(i == 2); } 1 On 10/10/12 08:25, Liviu Nicoara wrote: 2012-10-10 Liviu Nicoara lnico...@apache.org * src/collate.cpp (__rw_strnxfrm): preserved embedded NULs Index: src/collate.cpp === --- src/collate.cpp (revision 1397825) +++ src/collate.cpp (working copy) @@ -480,113 +480,103 @@ { _STD::string res; -char buf [256]; -char *pbuf = buf; - +char buf [256], *pbuf = buf, *psrc = buf; size_t bufsize = sizeof buf; -char *psrc = buf; while (nchars) { -// using a C-style cast instead of static_cast to avoid -// a gcc 2.95.2 bug causing an error on some platforms: -// static_cast from `void *' to `const char *' -const char* const last = (const char*)memchr (src, '\0', nchars); - -if (0 == last) { - -// no NUL found in the initial portion of the source string -// that fits into the local temporary buffer; copy as many -// characters as fit into the buffer +if (src [0]) { -if (bufsize = nchars) { -if (pbuf != buf) -delete[] pbuf; -pbuf = new char [nchars + 1]; +// using a C-style cast instead of static_cast to avoid +// a gcc 2.95.2 bug causing an error on some platforms: +// static_cast from `void *' to `const char *' +const char* const last = (const char*)memchr (src, '\0', nchars); + +if (0 == last) { +// no NUL found in the initial portion of the source string +// that fits into the local temporary buffer; copy as many +// characters as fit into the buffer + +if (bufsize = nchars) { +if (pbuf != buf) +delete[] pbuf; +pbuf = new char [nchars + 1]; +} + +psrc = pbuf; +memcpy (psrc, src, nchars); + +// append a terminating NUL and decrement the number +// of characters that remain to be processed +psrc [nchars] = '\0'; +src += nchars; +nchars= 0; +} +else { +// terminating NUL found in the source buffer +nchars -= (last - src) + 1; +psrc= _RWSTD_CONST_CAST (char*, src); +src+= (last - src) + 1; } -psrc = pbuf; -memcpy (psrc, src, nchars); +#ifdef _RWSTD_OS_SUNOS +// Solaris 10u5 on AMD64 overwrites memory past the end of +// just_in_case_buf[8], to avoid this, pass a null pointer +char* const just_in_case_buf = 0; +#else +// provide a destination buffer to strxfrm() in case +// it's buggy (such as MSVC's) and tries to write to +// the buffer even if it's 0 +char just_in_case_buf [8]; +#endif // _RWSTD_OS_SUNOS -// append a terminating NUL and decrement the number -// of characters that remain to be processed -psrc [nchars] = '\0'; -src += nchars; -nchars= 0; -} -else { +const size_t dst_size = strxfrm (just_in_case_buf, psrc, 0); -// terminating NUL found in the source buffer -nchars -= (last - src) + 1; -psrc= _RWSTD_CONST_CAST (char*, src); -src+= (last - src) + 1; -} +// check for strxfrm() errors +if (0 == (dst_size 1)) { +if (pbuf != buf) +delete[] pbuf; -#ifdef _RWSTD_OS_SUNOS -// Solaris 10u5 on AMD64 overwrites memory past the end of -// just_in_case_buf[8], to avoid this, pass a null pointer -char* const just_in_case_buf = 0; -#else -// provide a destination buffer to strxfrm() in case -// it's buggy (such as MSVC's) and tries to write to -// the buffer even if it's 0 -char just_in_case_buf [8]; -#endif - -const size_t dst_size = strxfrm (just_in_case_buf, psrc, 0); - -// check for strxfrm() errors -if (0 == (dst_size 1)) { -
[PATCH] STDCXX-1073
2012-10-10 Liviu Nicoara lnico...@apache.org * src/collate.cpp (__rw_strnxfrm): preserved embedded NULs Index: src/collate.cpp === --- src/collate.cpp (revision 1392832) +++ src/collate.cpp (working copy) @@ -547,7 +547,7 @@ _TRY { // resize the result string to fit itself plus the result -// of the transformation including the terminatin NUL +// of the transformation including the terminating NUL // appended by strxfrm() res.resize (res_size + dst_size + 1); } @@ -557,36 +557,15 @@ _RETHROW; } -// transfor the source string up to the terminating NUL -size_t xfrm_size = -strxfrm (res [0] + res_size, psrc, dst_size + 1); - -#if defined _MSC_VER _MSC_VER 1400 -// compute the correct value that should have been returned from -// strxfrm() after the transformation has completed (MSVC strxfrm() -// returns a bogus result; see PR #29935) -xfrm_size = strlen (res [0] + res_size); -#endif // MSVC 8.0 - -// increment the size of the result string by the number -// of transformed characters excluding the terminating NUL -// if strxfrm() transforms the empty string into the empty -// string, keep the terminating NUL, otherwise drop it -res_size += xfrm_size + (last !*psrc !xfrm_size); - -_TRY { -res.resize (res_size); -} -_CATCH (...) { -if (pbuf != buf) -delete[] pbuf; -_RETHROW; -} +strxfrm (res [0] + res_size, psrc, dst_size + 1); } if (pbuf != buf) delete[] pbuf; +if (!res.empty ()) +res.resize (res.size () - 1); + return res; }