UTF32/UTF8 in std::string

Александр Гурьянов Fri, 16 Feb 2018 03:35:59 -0800

Hi, I am working on project that uses custom String class, this string
class have methods:


std::string String::getStdString() const
{
    return std::string(value.begin(), value.end());
}

and:

std::string ZString::asUtf8() const
{
    return Unicode::utf32ToUtf8(value); // UTF32 -> UTF
}

--

u8string utf32ToUtf8(const u32string& input)
{
    u8string mb;

    for (std::size_t i = 0, length = input.size(); i < length; ++i) {
        auto c = input[i];

        if (c <= 0x7fu) {
            // single-byte sequence
            mb += c;
        } else if (0x80u <= c && c <= 0x7ffu) {
            // two-byte sequence
            mb += (c >> 6) | 0xc0u;
            mb += (c & 0x3fu) | 0x80u;
        } else if ((0x800u <= c && c <= 0xfffu) || (0x1000u <= c && c
<= 0xcfffu) || (0xd000u <= c && c <= 0xd7ffu)
                   || (0xe000u <= c && c <= 0xffffu)) {
            // three-byte sequence
            mb += (c >> 12) | 0xe0u;
            mb += ((c >> 6) & 0x3fu) | 0x80u;
            mb += (c & 0x3fu) | 0x80u;
        } else if ((0x10000u <= c && c <= 0x3ffffu) || (0x40000u <= c
&& c <= 0xfffffu)
                   || (0x100000u <= c && c <= 0x10ffffu)) {
            // four-byte sequence
            mb += (c >> 18) | 0xf0u;
            mb += ((c >> 12) & 0x3fu) | 0x80u;
            mb += ((c >> 6) & 0x3fu) | 0x80u;
            mb += (c & 0x3fu) | 0x80u;
        } else {
            break;
        }
    }

    return mb;
}

I have a some string ('Александр'), that internally coded as utf32,
and I do the following:
printf("1. %s\n", string->getStdString().c_str());
printf("2. %s\n", string->asUtf8().c_str());

And there is results for native environment:
1. Александр
2. Александр

For browser:
1.  ;5:A0=4@  C@LO=>2
2. Александр

Why so? Is it expected behaviour? It was tricky to find, because I
thought that there is memory corruption but not.

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

UTF32/UTF8 in std::string

Reply via email to