Re: [PATCH] STDCXX-1073

2012-10-28 Thread Liviu Nicoara

On 10/25/12 11:31, Martin Sebor wrote:

There are comments suggesting that calling do_transform()
on the whole string may be suboptimal. Intuitively it makes
sense that calling wcscoll() (in a loop, on the NUL-terminated
substrings, if necessary) should be faster than simply calling
do_transform() followed by wstring::compare(), but it would
make sense to confirm the hypothesis before implementing the
optimization.


Alright, same files, two more patches.

2012-10-28  Liviu Nicoara  lnico...@apache.org

Fixes to collate facet and test enhancements:

* src/collate.cpp (__rw_strnxfrm): corrected processing of
  embedded NULs. (__rw_wcsnxfrm) same (duplicated code).
  (collate_bynamewchar_t::do_compare): fixed string comparison
  return values, re-implemented the wcscoll-based comparison.
* tests/localization/22.locale.collate.cpp: implemented a simpler
  collation test for strings with embedded NULs.

All existing and new collate tests pass, but I think some more test 
enhancements are required.

Liviu
Index: tests/localization/22.locale.collate.cpp
===
--- tests/localization/22.locale.collate.cpp(revision 1402988)
+++ tests/localization/22.locale.collate.cpp(working copy)
@@ -116,7 +116,7 @@ const char* widen (char* dst, const char
 return dst;
 }
 
-#ifndef _RWSTD_NO_WCHAR_T
+#if !defined (_RWSTD_NO_WCHAR_T)
 
 int c_strcoll (const wchar_t* s1, const wchar_t* s2)
 {
@@ -1029,65 +1029,119 @@ check_hash_eff (const char* charTname)
 
 template class charT
 void
-check_NUL_locale (const char* charTname, const char* locname)
+check_NUL_collate (const char* charTname, const char* locname,
+   const charT* s1, size_t s1_len,
+   const charT* s2, size_t s2_len)
 {
 std::locale loc (locname);
 
-charT s [STR_SIZE];
-gen_str (s, STR_SIZE);
+typedef typename std::collatecharT Collate;
+typedef typename Collate::string_type String;
 
-charT buf [2][STR_SIZE];
+const Collate col = std::use_facetCollate (loc);
 
-std::memcpy (buf [0], s, sizeof s);
-std::memcpy (buf [1], s, sizeof s);
+const String x1 = col.transform (s1, s1 + s1_len);
+const String x2 = col.transform (s2, s2 + s2_len);
 
-//
-// Verify that first buffer compares more:
-// |0| = buf [0]
-// |0| = buf [1]
-// 
-buf [0][4] = charT ();
-buf [1][3] = charT ();
+const int colcmp = col.compare (s1, s1 + s1_len, s2, s2 + s2_len);
 
-typedef std::collatecharT Collate;
+int lexcmp = x1.compare (x2);
+lexcmp = lexcmp  -1 ? -1 : 1  lexcmp ? 1 : lexcmp;
+
+rw_assert (colcmp == lexcmp, __FILE__, __LINE__,
+   collate%s::compare (%{*.*Ac}, %{*.*Ac}) = %d, 
+   lexicographical comparison of transformed strings = %d, 
+   mismatch in locale (\%s\), charTname,
+   sizeof (charT), s1_len, s1, 
+   sizeof (charT), s2_len, s2, 
+   colcmp, lexcmp, locname);
+
+const bool eq = 
+std::string (s1, s1 + s1_len) == 
+std::string (s2, s2 + s2_len);
+
+rw_assert (bool (colcmp) != eq, __FILE__, __LINE__,
+   collate%s::compare (%{*.*Ac}, %{*.*Ac}) = %d, 
+   lexicographical compare = %s, mismatch in locale (\%s\),
+   charTname,
+   sizeof (charT), s1_len, s1, 
+   sizeof (charT), s2_len, s2, colcmp,
+   (eq ? true : false), locname);
+}
 
-const Collate col = std::use_facetCollate (loc);
+static void
+check_NUL_collate (const char* charTname, const char* locname, char)
+{
+#define T(s, t) \
+check_NUL_collate (charTname, locname,  \
+   s, sizeof s / sizeof *s - 1, \
+   t, sizeof t / sizeof *t - 1)
+
+T (, );
+T (, \0);
+T (, \0\0);
+T (\0, );
+T (\0, \0);
+T (\0, \0\0);
+T (a, \0);
+T (a, \0a);
+T (a, a\0);
+T (a, a\0\0);
+T (a\0, a);
+T (a\0, a\0);
+T (a\0, a\0\0);
+T (\0a, );
+T (\0a, \0);
+T (\0a, \0a);
+T (\0a, \0a\0);
+T (a\0\0b, );
+T (a\0\0b, a);
+T (a\0\0b, ab);
+T (a\0\0b, a\0);
+T (a\0\0b, a\0\0);
+T (a\0\0b, a\0b);
+T (a\0\0b, a\0\0b);
+}
 
-int cmp = col.compare (
-buf [0], buf [0] + sizeof buf [0] / sizeof *buf [0], 
-buf [1], buf [1] + sizeof buf [1] / sizeof *buf [1]);
-
-rw_assert (cmp  0, __FILE__, __LINE__,
-   collate%s::compare (%{*.*Ac}, %{*.*Ac}) 
- 0, failed in locale (\%s\), charTname,
-   sizeof (charT), sizeof buf [0] / sizeof *buf [0], buf [0],
-   sizeof (charT), sizeof buf [1] / sizeof *buf [1], buf [1],
-   locname);
-
-std::memcpy (buf [0], s, sizeof s);
-std::memcpy (buf [1], s, sizeof s);
-
-//
-// Verify that first compare less:
-// 

Re: [PATCH] STDCXX-1073

2012-10-25 Thread Martin Sebor

On 10/25/2012 06:22 AM, Liviu Nicoara wrote:

On 10/24/12 11:11, Martin Sebor wrote:

On 10/24/2012 07:12 AM, Liviu Nicoara wrote:

[...]
I modified the test according to the suggestions. The test fails all
corresponding wide-char cases and I am investigating other potential
defects as well. For example, I do not think that employing strcoll and
wcscoll in compare is correct as they stop at the first NUL, although
strings may contain characters after the NUL that alter the result of
the comparison.


I would expect the wchar_t specialization to be analogous
to the narrow one. In fact (without looking at the code),
I would even think both could be implemented in terms of
the same function template specialized on the character
type and on the libc string function. (Although I'm not
necessarily suggesting this as the solution to this issue.)



They are not similar, AFAICT. In revision 367462 you implemented the
wide byname do_compare in terms of wcscoll, if available, when using
libc. That implementation does not take into account embedded NULs. This
is in contrast with the narrow byname which simply transforms the
strings and compares them.


IIUC, rev 367462 actually implements the function in terms of
do_transform() when wcscoll() isn't available as a workaround.
Ironically, the workaround is actually better than the default
implementation.



Testing shows that an implementation of wide byname do_compare identical
with the narrow version passes all tests, as expected.

It's easy to me to just remove the wcscoll-based implementation. Also,
error reporting from wcscoll seems difficult to use when libc does not
have thread-local errno's, and right now we don't check for errors.
OTOH, I expect wcscoll to be faster than the (simpler) transformation
followed by comparison.

I incline towards the simpler approach. Thoughts?


There are comments suggesting that calling do_transform()
on the whole string may be suboptimal. Intuitively it makes
sense that calling wcscoll() (in a loop, on the NUL-terminated
substrings, if necessary) should be faster than simply calling
do_transform() followed by wstring::compare(), but it would
make sense to confirm the hypothesis before implementing the
optimization.

Martin



Liviu




Re: [PATCH] STDCXX-1073

2012-10-24 Thread Martin Sebor

On 10/24/2012 07:12 AM, Liviu Nicoara wrote:

A few observations after spending a few hours working on the improved test.

On 10/21/12 19:08, Martin Sebor wrote:

...
There's no requirement that embedded NULs must be preserved
(that's just how it happens to be implemented). I find it best
to avoid relying on the knowledge of implementation details
when exercising the library so that tests don't start failing
after a conforming optimization or some such tweak is added
to the code.



I agree with the implementation details part. However, there is only one
way NULs get processed and that is by keeping them in the string,
verbatim. In this respect the previous test incarnation was a stronger
test.


Consider an encoding where UCHAR_MAX doesn't correspond
to a valid character, and a hypothetical implementation that,
before transforming the string, increments the value of each
character by 1. That will lose all the initial embedded NULs.



I modified the test according to the suggestions. The test fails all
corresponding wide-char cases and I am investigating other potential
defects as well. For example, I do not think that employing strcoll and
wcscoll in compare is correct as they stop at the first NUL, although
strings may contain characters after the NUL that alter the result of
the comparison.


I would expect the wchar_t specialization to be analogous
to the narrow one. In fact (without looking at the code),
I would even think both could be implemented in terms of
the same function template specialized on the character
type and on the libc string function. (Although I'm not
necessarily suggesting this as the solution to this issue.)




Attached is a test case that reproduces the bug without relying
on implementation details. The test passes with your patch,
confirming that it's good. This test still makes the assumption
that strings that lexicographically distinct strings compare
unequal in the en_US locale, but I think that's a safe assumption
regardless of the encoding (though only for the hardcoded strings).


All narrow test cases pass fine, but I gotta look into the wide-char
failures.


Good to hear! I wish I had more time to look into it with you.

Martin



Liviu





Re: [PATCH] STDCXX-1073

2012-10-21 Thread Martin Sebor

The library patch seems correct but the test case in the issue
and the new test aren't strictly conforming. The only requirement
on collate::tranform() is that it return...

A basic_stringcharT value that, compared lexicographically
with the result of calling transform() on another string,
yields the same result as calling do_compare() on the same
two strings.

There's no requirement that embedded NULs must be preserved
(that's just how it happens to be implemented). I find it best
to avoid relying on the knowledge of implementation details
when exercising the library so that tests don't start failing
after a conforming optimization or some such tweak is added
to the code.

Attached is a test case that reproduces the bug without relying
on implementation details. The test passes with your patch,
confirming that it's good. This test still makes the assumption
that strings that lexicographically distinct strings compare
unequal in the en_US locale, but I think that's a safe assumption
regardless of the encoding (though only for the hardcoded strings).

Martin

On 10/18/2012 04:28 PM, Liviu Nicoara wrote:

On 10/18/12 11:02, Martin Sebor wrote:

On 10/16/2012 10:38 AM, Liviu Nicoara wrote:

I have enhanced the collation test, 22.locale.collate.cpp with a bunch
of cases that deal with embedded strings, inside the input strings as
well as at the edges. More defects became apparent, and they have been
fixed.

I have re-worked the src/collate.cpp patch and I am attaching it here.
All collation tests (old and new) pass. If there are no objections, I
will check it in later in the week.


There are tabs in the patch (we need spaces). They also make
the patch harder to review than it otherwise would be. I would
also suggest keeping the MSVC  8.0 block and cleaning things
up separately. That will also make the patch easier to review.
(I'm so swamped that this matters to me.)


I looked at both (latest) patches and they do not have tabs in them.
They were sent in my last post on the 16th, the one you replied to. The
diff without -w is quite a bit verbose because I moved a whole block in
both collate.cpp functions, inside an if, so there are whitespace
differences. svn diff -w gave me a quite decent view of the changes
without having to looks at the actual patched source file.

I am re-attaching the patches, this time created without -w so they
would be a bit verbose. No tabs this time either.



I haven't looked at the test patch yet. If you could clean up
the tab issue and reduce the amount of unnecessary whitespace
changes I promise to review it this weekend.


The test enhancements are pretty plain, just a bunch of added stuff.

Thanks,
Liviu



#include cassert
#include cstdio
#include locale

int main () {
const std::locale loc (en_US);
typedef std::collatechar C;
const C col = std::use_facetC(loc);

static const struct {
const char *a;
unsigned n1;
const char *b;
unsigned n2;
} tests[] = {
#define T(s, t) { s, sizeof s  - 1, t, sizeof t - 1 }

T (, ),
T (, \0),
T (, \0\0),
T (\0, ),
T (\0, \0),
T (\0, \0\0),
T (a, \0),
T (a, \0a),
T (a, a\0),
T (a, a\0\0),
T (a\0, a),
T (a\0, a\0),
T (a\0, a\0\0),
T (\0a, ),
T (\0a, \0),
T (\0a, \0a),
T (\0a, \0a\0),
T (a\0\0b, ),
T (a\0\0b, a),
T (a\0\0b, ab),
T (a\0\0b, a\0),
T (a\0\0b, a\0\0),
T (a\0\0b, a\0b),
T (a\0\0b, a\0\0b),
};

for (unsigned i = 0; i != sizeof tests / sizeof *tests; ++i) {
const char* const a = tests [i].a;
unsigned n1 = tests [i].n1;
const char* const b = tests [i].b;
unsigned n2 = tests [i].n2;

const std::string x1 = col.transform (a, a + n1);
const std::string x2 = col.transform (b, b + n2);

const int colcmp = col.compare (a, a + n1, b, b + n2);
  int lexcmp = x1.compare (x2);
lexcmp = lexcmp  -1 ? -1 : 1  lexcmp ? 1 : lexcmp;

std::printf (%u: %i == %i\n, i, lexcmp, colcmp);

assert (colcmp == lexcmp);
if (colcmp)
assert (std::string (a, a + n1) != std::string (b, b + n2));
else
assert (std::string (a, a + n1) == std::string (b, b + n2));
}
}


Re: [PATCH] STDCXX-1073

2012-10-21 Thread Liviu Nicoara

On 10/21/12 19:08, Martin Sebor wrote:

There's no requirement that embedded NULs must be preserved
(that's just how it happens to be implemented). I find it best
to avoid relying on the knowledge of implementation details
when exercising the library so that tests don't start failing
after a conforming optimization or some such tweak is added
to the code.


That's right. I'll improve it.

Thanks for the review.

Liviu


Re: [PATCH] STDCXX-1073

2012-10-18 Thread Martin Sebor

On 10/16/2012 10:38 AM, Liviu Nicoara wrote:

I have enhanced the collation test, 22.locale.collate.cpp with a bunch
of cases that deal with embedded strings, inside the input strings as
well as at the edges. More defects became apparent, and they have been
fixed.

I have re-worked the src/collate.cpp patch and I am attaching it here.
All collation tests (old and new) pass. If there are no objections, I
will check it in later in the week.


There are tabs in the patch (we need spaces). They also make
the patch harder to review than it otherwise would be. I would
also suggest keeping the MSVC  8.0 block and cleaning things
up separately. That will also make the patch easier to review.
(I'm so swamped that this matters to me.)

I haven't looked at the test patch yet. If you could clean up
the tab issue and reduce the amount of unnecessary whitespace
changes I promise to review it this weekend.

Martin



Thanks,
Liviu

On 10/13/12 11:16, Liviu Nicoara wrote:

The initial patch does not pass the following test case. Have re-worked
the patch and attached it to the incident, and I am also attaching it
here. It passes all collate tests.








Re: [PATCH] STDCXX-1073

2012-10-18 Thread Liviu Nicoara

On 10/18/12 11:02, Martin Sebor wrote:

On 10/16/2012 10:38 AM, Liviu Nicoara wrote:

I have enhanced the collation test, 22.locale.collate.cpp with a bunch
of cases that deal with embedded strings, inside the input strings as
well as at the edges. More defects became apparent, and they have been
fixed.

I have re-worked the src/collate.cpp patch and I am attaching it here.
All collation tests (old and new) pass. If there are no objections, I
will check it in later in the week.


There are tabs in the patch (we need spaces). They also make
the patch harder to review than it otherwise would be. I would
also suggest keeping the MSVC  8.0 block and cleaning things
up separately. That will also make the patch easier to review.
(I'm so swamped that this matters to me.)


I looked at both (latest) patches and they do not have tabs in them. They were 
sent in my last post on the 16th, the one you replied to. The diff without -w 
is quite a bit verbose because I moved a whole block in both collate.cpp 
functions, inside an if, so there are whitespace differences. svn diff -w gave 
me a quite decent view of the changes without having to looks at the actual 
patched source file.

I am re-attaching the patches, this time created without -w so they would be a 
bit verbose. No tabs this time either.



I haven't looked at the test patch yet. If you could clean up
the tab issue and reduce the amount of unnecessary whitespace
changes I promise to review it this weekend.


The test enhancements are pretty plain, just a bunch of added stuff.
 
Thanks,

Liviu

Index: tests/localization/22.locale.collate.cpp
===
--- tests/localization/22.locale.collate.cpp(revision 1399886)
+++ tests/localization/22.locale.collate.cpp(working copy)
@@ -1029,7 +1029,105 @@ check_hash_eff (const char* charTname)
 
 template class charT
 void
-check_NUL_locale (const char* charTname, const char* locname)
+check_NUL_transform (const char* charTname, const char* locname,
+ const charT* src, size_t src_len,
+ const char*  pat, size_t pat_len)
+{
+std::locale loc (locname);
+
+typedef typename std::collatecharT Collate;
+typedef typename Collate::string_type String;
+
+const Collate col = std::use_facetCollate (loc);
+
+String s = col.transform (src, src + src_len);
+
+for (size_t i = 0, j = 0; i  pat_len; ++i) { 
+
+if (j = s.size ()) {
+rw_assert (false, __FILE__, __LINE__,
+   collate%s::transform (%{*.*Ac}) -  %{*.*Ac}
+   , incomplete transformation (locale \%s\), 
+   charTname, sizeof (charT), src_len, src, 
+   sizeof (charT), s.size (), s.c_str (), locname);
+break;
+}
+
+switch (pat [i]) {
+case 0:
+rw_assert (j  s.size ()  0 == s [j], __FILE__, __LINE__,
+   collate%s::transform (\%{*.*Ac}\) -  \%{*.*Ac}\
+   , expected \\0, got %d in position %lu (locale 
\%s\), 
+   charTname, sizeof (charT), src_len, src, sizeof 
(charT), 
+   s.size (), s.c_str (), s [j], j, locname);
+++j;
+break;
+
+case '*':
+rw_assert (j  s.size ()  s [j], __FILE__, __LINE__,
+   collate%s::transform (\%{*.*Ac}\) -  \%{*.*Ac}\
+   , got NUL in position %lu (locale \%s\), charTname,
+   sizeof (charT), src_len, src, sizeof (charT), s.size 
(), 
+   s.c_str (), j, locname);
+for (; j  s.size ()  s [j]; ++j) ;
+break;
+
+default:
+break;
+}
+}
+}
+
+static void
+check_NUL_transform_narrow (const char* charTname, const char* locname)
+{
+#define T(src, pat) \
+check_NUL_transform (charTname, locname,\
+ src, sizeof src / sizeof *src - 1, \
+ pat, sizeof pat / sizeof *pat - 1)
+
+T (\0,\0   );
+T (\0\0,  \0\0 );
+T (\0\0\0,\0\0\0   );
+T (a\0,   *\0  );
+T (\0a\0, \0*\0);
+T (\0a\0, \0*\0);
+T (a\0\0, *\0\0);
+T (a\0\0\0,   *\0\0\0  );
+T (a\0b,  *\0* );
+T (a\0b\0,*\0*\0   );
+T (a\0b\0\0,  *\0*\0\0 );
+}
+
+static void
+check_NUL_transform_wide (const char* charTname, const char* locname)
+{
+T (L\0,   \0   );
+T (L\0\0, \0\0 );
+T (L\0\0\0,   \0\0\0   );
+T (La\0,  *\0  );
+T (L\0a\0,\0*\0);
+T (L\0a\0,\0*\0);
+T (La\0\0,*\0\0);
+T (La\0\0\0,  *\0\0\0  );
+T (La\0b, *\0* );
+T (La\0b\0,   *\0*\0   );
+T (La\0b\0\0, *\0*\0\0 );
+
+#undef T
+}
+
+template class charT
+void
+check_NUL_transform (const char* 

Re: [PATCH] STDCXX-1073

2012-10-16 Thread Liviu Nicoara
I have enhanced the collation test, 22.locale.collate.cpp with a bunch 
of cases that deal with embedded strings, inside the input strings as 
well as at the edges. More defects became apparent, and they have been 
fixed.


I have re-worked the src/collate.cpp patch and I am attaching it here. 
All collation tests (old and new) pass. If there are no objections, I 
will check it in later in the week.


Thanks,
Liviu

On 10/13/12 11:16, Liviu Nicoara wrote:

The initial patch does not pass the following test case. Have re-worked
the patch and attached it to the incident, and I am also attaching it
here. It passes all collate tests.




Index: tests/localization/22.locale.collate.cpp
===
--- tests/localization/22.locale.collate.cpp(revision 1397847)
+++ tests/localization/22.locale.collate.cpp(working copy)
@@ -1029,7 +1029,105 @@ check_hash_eff (const char* charTname)
 
 template class charT
 void
-check_NUL_locale (const char* charTname, const char* locname)
+check_NUL_transform (const char* charTname, const char* locname,
+ const charT* src, size_t src_len,
+ const char*  pat, size_t pat_len)
+{
+std::locale loc (locname);
+
+typedef typename std::collatecharT Collate;
+typedef typename Collate::string_type String;
+
+const Collate col = std::use_facetCollate (loc);
+
+String s = col.transform (src, src + src_len);
+
+for (size_t i = 0, j = 0; i  pat_len; ++i) { 
+
+if (j = s.size ()) {
+rw_assert (false, __FILE__, __LINE__,
+   collate%s::transform (%{*.*Ac}) -  %{*.*Ac}
+   , incomplete transformation (locale \%s\), 
+   charTname, sizeof (charT), src_len, src, 
+   sizeof (charT), s.size (), s.c_str (), locname);
+break;
+}
+
+switch (pat [i]) {
+case 0:
+rw_assert (j  s.size ()  0 == s [j], __FILE__, __LINE__,
+   collate%s::transform (\%{*.*Ac}\) -  \%{*.*Ac}\
+   , expected \\0, got %d in position %lu (locale 
\%s\), 
+   charTname, sizeof (charT), src_len, src, sizeof 
(charT), 
+   s.size (), s.c_str (), s [j], j, locname);
+++j;
+break;
+
+case '*':
+rw_assert (j  s.size ()  s [j], __FILE__, __LINE__,
+   collate%s::transform (\%{*.*Ac}\) -  \%{*.*Ac}\
+   , got NUL in position %lu (locale \%s\), charTname,
+   sizeof (charT), src_len, src, sizeof (charT), s.size 
(), 
+   s.c_str (), j, locname);
+for (; j  s.size ()  s [j]; ++j) ;
+break;
+
+default:
+break;
+}
+}
+}
+
+static void
+check_NUL_transform_narrow (const char* charTname, const char* locname)
+{
+#define T(src, pat) \
+check_NUL_transform (charTname, locname,\
+ src, sizeof src / sizeof *src - 1, \
+ pat, sizeof pat / sizeof *pat - 1)
+
+T (\0,\0   );
+T (\0\0,  \0\0 );
+T (\0\0\0,\0\0\0   );
+T (a\0,   *\0  );
+T (\0a\0, \0*\0);
+T (\0a\0, \0*\0);
+T (a\0\0, *\0\0);
+T (a\0\0\0,   *\0\0\0  );
+T (a\0b,  *\0* );
+T (a\0b\0,*\0*\0   );
+T (a\0b\0\0,  *\0*\0\0 );
+}
+
+static void
+check_NUL_transform_wide (const char* charTname, const char* locname)
+{
+T (L\0,   \0   );
+T (L\0\0, \0\0 );
+T (L\0\0\0,   \0\0\0   );
+T (La\0,  *\0  );
+T (L\0a\0,\0*\0);
+T (L\0a\0,\0*\0);
+T (La\0\0,*\0\0);
+T (La\0\0\0,  *\0\0\0  );
+T (La\0b, *\0* );
+T (La\0b\0,   *\0*\0   );
+T (La\0b\0\0, *\0*\0\0 );
+
+#undef T
+}
+
+template class charT
+void
+check_NUL_transform (const char* charTname, const char* locname)
+{
+check_NUL_transform_narrow (charTname, locname);
+check_NUL_transform_wide   (charTname, locname);
+}
+
+template class charT
+void
+check_NUL_collate (const char* charTname, const char* locname)
 {
 std::locale loc (locname);
 
@@ -1090,6 +1188,14 @@ check_NUL_locale (const char* charTname,
 
 template class charT
 void
+check_NUL (const char* charTname, const char* locname)
+{
+check_NUL_transform charT  (charTname, locname);
+check_NUL_collate charT(charTname, locname);
+}
+
+template class charT
+void
 check_NUL (const char* charTname)
 {
 // Verify that the collate facet correctly handles character
@@ -1103,7 +1209,7 @@ check_NUL (const char* charTname)
 for (const char* locname = rw_locales (LC_COLLATE); 
  *locname; locname += std::strlen (locname) + 1, ++i) {
 try {
-check_NUL_localecharT (charTname, locname);
+ 

Re: [PATCH] STDCXX-1073

2012-10-13 Thread Liviu Nicoara
The initial patch does not pass the following test case. Have re-worked 
the patch and attached it to the incident, and I am also attaching it 
here. It passes all collate tests.


Here is the second test case:

$ cat ../../tests/localization/t.cpp; nice make t.cpp  ./t af_ZA.utf8; 
echo $?

#include iostream
#include locale
#include string

int
main (int argc, char** argv)
{
std::locale loc (argv [1]);

const std::collate char  fac =
std::use_facet std::collate char   (loc);

char const buf [] = a\0\0b;
std::string s = fac.transform (buf, buf + sizeof buf - 1);

size_t i = 0;
for (; i  s.size ()  0 == s [i]; ++i) ;

return !(i == 2);
}
1

On 10/10/12 08:25, Liviu Nicoara wrote:

2012-10-10  Liviu Nicoara  lnico...@apache.org

 * src/collate.cpp (__rw_strnxfrm): preserved embedded NULs



Index: src/collate.cpp
===
--- src/collate.cpp (revision 1397825)
+++ src/collate.cpp (working copy)
@@ -480,113 +480,103 @@
 {
 _STD::string res;
 
-char buf [256];
-char *pbuf = buf;
-
+char buf [256], *pbuf = buf, *psrc = buf;
 size_t bufsize = sizeof buf;
-char *psrc = buf;
 
 while (nchars) {
 
-// using a C-style cast instead of static_cast to avoid
-// a gcc 2.95.2 bug causing an error on some platforms:
-//   static_cast from `void *' to `const char *'
-const char* const last = (const char*)memchr (src, '\0', nchars);
-
-if (0 == last) {
-
-// no NUL found in the initial portion of the source string
-// that fits into the local temporary buffer; copy as many
-// characters as fit into the buffer
+if (src [0]) {
 
-if (bufsize = nchars) {
-if (pbuf != buf)
-delete[] pbuf;
-pbuf = new char [nchars + 1];
+// using a C-style cast instead of static_cast to avoid
+// a gcc 2.95.2 bug causing an error on some platforms:
+//   static_cast from `void *' to `const char *'
+const char* const last = (const char*)memchr (src, '\0', nchars);
+
+if (0 == last) {
+// no NUL found in the initial portion of the source string
+// that fits into the local temporary buffer; copy as many
+// characters as fit into the buffer
+
+if (bufsize = nchars) {
+if (pbuf != buf)
+delete[] pbuf;
+pbuf = new char [nchars + 1];
+}
+
+psrc = pbuf;
+memcpy (psrc, src, nchars);
+
+// append a terminating NUL and decrement the number
+// of characters that remain to be processed
+psrc [nchars] = '\0';
+src  += nchars;
+nchars= 0;
+}
+else {
+// terminating NUL found in the source buffer
+nchars -= (last - src) + 1;
+psrc= _RWSTD_CONST_CAST (char*, src);
+src+= (last - src) + 1;
 }
 
-psrc = pbuf;
-memcpy (psrc, src, nchars);
+#ifdef _RWSTD_OS_SUNOS
+// Solaris 10u5 on AMD64 overwrites memory past the end of
+// just_in_case_buf[8], to avoid this, pass a null pointer
+char* const just_in_case_buf = 0;
+#else
+// provide a destination buffer to strxfrm() in case
+// it's buggy (such as MSVC's) and tries to write to
+// the buffer even if it's 0
+char just_in_case_buf [8];
+#endif // _RWSTD_OS_SUNOS
 
-// append a terminating NUL and decrement the number
-// of characters that remain to be processed
-psrc [nchars] = '\0';
-src  += nchars;
-nchars= 0;
-}
-else {
+const size_t dst_size = strxfrm (just_in_case_buf, psrc, 0);
 
-// terminating NUL found in the source buffer
-nchars -= (last - src) + 1;
-psrc= _RWSTD_CONST_CAST (char*, src);
-src+= (last - src) + 1;
-}
+// check for strxfrm() errors
+if (0 == (dst_size  1)) {
+if (pbuf != buf)
+delete[] pbuf;
 
-#ifdef _RWSTD_OS_SUNOS
-// Solaris 10u5 on AMD64 overwrites memory past the end of
-// just_in_case_buf[8], to avoid this, pass a null pointer
-char* const just_in_case_buf = 0;
-#else
-// provide a destination buffer to strxfrm() in case
-// it's buggy (such as MSVC's) and tries to write to
-// the buffer even if it's 0
-char just_in_case_buf [8];
-#endif
-
-const size_t dst_size = strxfrm (just_in_case_buf, psrc, 0);
-
-// check for strxfrm() errors
-if (0 == (dst_size  1)) {
- 

[PATCH] STDCXX-1073

2012-10-10 Thread Liviu Nicoara

2012-10-10  Liviu Nicoara  lnico...@apache.org

* src/collate.cpp (__rw_strnxfrm): preserved embedded NULs

Index: src/collate.cpp
===
--- src/collate.cpp (revision 1392832)
+++ src/collate.cpp (working copy)
@@ -547,7 +547,7 @@
 
 _TRY {
 // resize the result string to fit itself plus the result
-// of the transformation including the terminatin NUL
+// of the transformation including the terminating NUL
 // appended by strxfrm()
 res.resize (res_size + dst_size + 1);
 }
@@ -557,36 +557,15 @@
 _RETHROW;
 }
 
-// transfor the source string up to the terminating NUL
-size_t xfrm_size =
-strxfrm (res [0] + res_size, psrc, dst_size + 1);
-
-#if defined _MSC_VER  _MSC_VER  1400
-// compute the correct value that should have been returned from
-// strxfrm() after the transformation has completed (MSVC strxfrm()
-// returns a bogus result; see PR #29935)
-xfrm_size = strlen (res [0] + res_size);
-#endif   // MSVC  8.0
-
-// increment the size of the result string by the number
-// of transformed characters excluding the terminating NUL
-// if strxfrm() transforms the empty string into the empty
-// string, keep the terminating NUL, otherwise drop it
-res_size += xfrm_size + (last  !*psrc  !xfrm_size);
-
-_TRY {
-res.resize (res_size);
-}
-_CATCH (...) {
-if (pbuf != buf)
-delete[] pbuf;
-_RETHROW;
-}
+strxfrm (res [0] + res_size, psrc, dst_size + 1);
 }
 
 if (pbuf != buf)
 delete[] pbuf;
 
+if (!res.empty ())
+res.resize (res.size () - 1);
+
 return res;
 }