The library patch seems correct but the test case in the issue
and the new test aren't strictly conforming. The only requirement
on collate::tranform() is that it return...

    A basic_string<charT> value that, compared lexicographically
    with the result of calling transform() on another string,
    yields the same result as calling do_compare() on the same
    two strings.

There's no requirement that embedded NULs must be preserved
(that's just how it happens to be implemented). I find it best
to avoid relying on the knowledge of implementation details
when exercising the library so that tests don't start failing
after a conforming optimization or some such tweak is added
to the code.

Attached is a test case that reproduces the bug without relying
on implementation details. The test passes with your patch,
confirming that it's good. This test still makes the assumption
that strings that lexicographically distinct strings compare
unequal in the en_US locale, but I think that's a safe assumption
regardless of the encoding (though only for the hardcoded strings).


On 10/18/2012 04:28 PM, Liviu Nicoara wrote:
On 10/18/12 11:02, Martin Sebor wrote:
On 10/16/2012 10:38 AM, Liviu Nicoara wrote:
I have enhanced the collation test, 22.locale.collate.cpp with a bunch
of cases that deal with embedded strings, inside the input strings as
well as at the edges. More defects became apparent, and they have been

I have re-worked the src/collate.cpp patch and I am attaching it here.
All collation tests (old and new) pass. If there are no objections, I
will check it in later in the week.

There are tabs in the patch (we need spaces). They also make
the patch harder to review than it otherwise would be. I would
also suggest keeping the MSVC < 8.0 block and cleaning things
up separately. That will also make the patch easier to review.
(I'm so swamped that this matters to me.)

I looked at both (latest) patches and they do not have tabs in them.
They were sent in my last post on the 16th, the one you replied to. The
diff without -w is quite a bit verbose because I moved a whole block in
both collate.cpp functions, inside an if, so there are whitespace
differences. svn diff -w gave me a quite decent view of the changes
without having to looks at the actual patched source file.

I am re-attaching the patches, this time created without -w so they
would be a bit verbose. No tabs this time either.

I haven't looked at the test patch yet. If you could clean up
the tab issue and reduce the amount of unnecessary whitespace
changes I promise to review it this weekend.

The test enhancements are pretty plain, just a bunch of added stuff.


#include <cassert>
#include <cstdio>
#include <locale>

int main () {
    const std::locale loc ("en_US");
    typedef std::collate<char> C;
    const C &col = std::use_facet<C>(loc);

    static const struct {
        const char *a;
        unsigned n1;
        const char *b;
        unsigned n2;
    } tests[] = {
#define T(s, t) { s, sizeof s  - 1, t, sizeof t - 1 }

        T ("", ""),
        T ("", "\0"),
        T ("", "\0\0"),
        T ("\0", ""),
        T ("\0", "\0"),
        T ("\0", "\0\0"),
        T ("a", "\0"),
        T ("a", "\0a"),
        T ("a", "a\0"),
        T ("a", "a\0\0"),
        T ("a\0", "a"),
        T ("a\0", "a\0"),
        T ("a\0", "a\0\0"),
        T ("\0a", ""),
        T ("\0a", "\0"),
        T ("\0a", "\0a"),
        T ("\0a", "\0a\0"),
        T ("a\0\0b", ""),
        T ("a\0\0b", "a"),
        T ("a\0\0b", "ab"),
        T ("a\0\0b", "a\0"),
        T ("a\0\0b", "a\0\0"),
        T ("a\0\0b", "a\0b"),
        T ("a\0\0b", "a\0\0b"),

    for (unsigned i = 0; i != sizeof tests / sizeof *tests; ++i) {
        const char* const a = tests [i].a;
        unsigned n1 = tests [i].n1;
        const char* const b = tests [i].b;
        unsigned n2 = tests [i].n2;

        const std::string x1 = col.transform (a, a + n1);
        const std::string x2 = col.transform (b, b + n2);

        const int colcmp = (a, a + n1, b, b + n2);
              int lexcmp = (x2);
        lexcmp = lexcmp < -1 ? -1 : 1 < lexcmp ? 1 : lexcmp;

        std::printf ("%u: %i == %i\n", i, lexcmp, colcmp);

        assert (colcmp == lexcmp);
        if (colcmp)
            assert (std::string (a, a + n1) != std::string (b, b + n2));
            assert (std::string (a, a + n1) == std::string (b, b + n2));

Reply via email to