Hi, I am working on project that uses custom String class, this string
class have methods:
std::string String::getStdString() const
{
return std::string(value.begin(), value.end());
}
and:
std::string ZString::asUtf8() const
{
return Unicode::utf32ToUtf8(value); // UTF32 -> UTF
}
--
u8string utf32ToUtf8(const u32string& input)
{
u8string mb;
for (std::size_t i = 0, length = input.size(); i < length; ++i) {
auto c = input[i];
if (c <= 0x7fu) {
// single-byte sequence
mb += c;
} else if (0x80u <= c && c <= 0x7ffu) {
// two-byte sequence
mb += (c >> 6) | 0xc0u;
mb += (c & 0x3fu) | 0x80u;
} else if ((0x800u <= c && c <= 0xfffu) || (0x1000u <= c && c
<= 0xcfffu) || (0xd000u <= c && c <= 0xd7ffu)
|| (0xe000u <= c && c <= 0xffffu)) {
// three-byte sequence
mb += (c >> 12) | 0xe0u;
mb += ((c >> 6) & 0x3fu) | 0x80u;
mb += (c & 0x3fu) | 0x80u;
} else if ((0x10000u <= c && c <= 0x3ffffu) || (0x40000u <= c
&& c <= 0xfffffu)
|| (0x100000u <= c && c <= 0x10ffffu)) {
// four-byte sequence
mb += (c >> 18) | 0xf0u;
mb += ((c >> 12) & 0x3fu) | 0x80u;
mb += ((c >> 6) & 0x3fu) | 0x80u;
mb += (c & 0x3fu) | 0x80u;
} else {
break;
}
}
return mb;
}
I have a some string ('Александр'), that internally coded as utf32,
and I do the following:
printf("1. %s\n", string->getStdString().c_str());
printf("2. %s\n", string->asUtf8().c_str());
And there is results for native environment:
1. Александр
2. Александр
For browser:
1. ;5:A0=4@ C@LO=>2
2. Александр
Why so? Is it expected behaviour? It was tricky to find, because I
thought that there is memory corruption but not.
--
You received this message because you are subscribed to the Google Groups
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.