Re: UTF32/UTF8 in std::string

Александр Гурьянов Sun, 18 Feb 2018 17:40:49 -0800

Sorry, this is my fault. Native & browser output is identical.

2018-02-16 18:34 GMT+07:00 Александр Гурьянов <[email protected]>:
> Hi, I am working on project that uses custom String class, this string
> class have methods:
>
> std::string String::getStdString() const
> {
>     return std::string(value.begin(), value.end());
> }
>
> and:
>
> std::string ZString::asUtf8() const
> {
>     return Unicode::utf32ToUtf8(value); // UTF32 -> UTF
> }
>
> --
>
> u8string utf32ToUtf8(const u32string& input)
> {
>     u8string mb;
>
>     for (std::size_t i = 0, length = input.size(); i < length; ++i) {
>         auto c = input[i];
>
>         if (c <= 0x7fu) {
>             // single-byte sequence
>             mb += c;
>         } else if (0x80u <= c && c <= 0x7ffu) {
>             // two-byte sequence
>             mb += (c >> 6) | 0xc0u;
>             mb += (c & 0x3fu) | 0x80u;
>         } else if ((0x800u <= c && c <= 0xfffu) || (0x1000u <= c && c
> <= 0xcfffu) || (0xd000u <= c && c <= 0xd7ffu)
>                    || (0xe000u <= c && c <= 0xffffu)) {
>             // three-byte sequence
>             mb += (c >> 12) | 0xe0u;
>             mb += ((c >> 6) & 0x3fu) | 0x80u;
>             mb += (c & 0x3fu) | 0x80u;
>         } else if ((0x10000u <= c && c <= 0x3ffffu) || (0x40000u <= c
> && c <= 0xfffffu)
>                    || (0x100000u <= c && c <= 0x10ffffu)) {
>             // four-byte sequence
>             mb += (c >> 18) | 0xf0u;
>             mb += ((c >> 12) & 0x3fu) | 0x80u;
>             mb += ((c >> 6) & 0x3fu) | 0x80u;
>             mb += (c & 0x3fu) | 0x80u;
>         } else {
>             break;
>         }
>     }
>
>     return mb;
> }
>
> I have a some string ('Александр'), that internally coded as utf32,
> and I do the following:
> printf("1. %s\n", string->getStdString().c_str());
> printf("2. %s\n", string->asUtf8().c_str());
>
> And there is results for native environment:
> 1. Александр
> 2. Александр
>
> For browser:
> 1.  ;5:A0=4@  C@LO=>2
> 2. Александр
>
> Why so? Is it expected behaviour? It was tricky to find, because I
> thought that there is memory corruption but not.


-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: UTF32/UTF8 in std::string

Reply via email to