On 1/29/16 6:03 PM, Marek Janukowicz wrote:
On Fri, 29 Jan 2016 17:43:26 -0500, Steven Schveighoffer wrote:
Is there anything I should know about UTF endianess?
It's not any different from other endianness.
In other words, a UTF16 code unit is expected to be in the endianness of
the platform you are running on.
If you are on x86 or x86_64 (very likely), then it should be little endian.
If your source of data is big-endian (or opposite from your native
endianness),
To be precise - my case is IMAP UTF7 folder name encoding and I finally found
out it's indeed big endian, which explains my problem (as I'm indeed on x86_64).
it will have to be converted before treating as a wchar[].
Is there any clever way to do the conversion? Or do I need to swap the bytes
manually?
No clever way, just the straightforward way ;)
Swapping endianness of 32-bits can be done with core.bitop.bswap. Doing
it with 16 bits I believe you have to do bit shifting. Something like:
foreach(ref elem; wcharArr) elem = ((elem << 8) & 0xff00) | ((elem >> 8)
& 0x00ff);
Or you can do it with the bytes directly before casting
Note the version identifiers BigEndian and LittleEndian can be used to
compile the correct code.
This solution is of no use to me as I don't want to change the endianess in
general.
What I mean is that you can annotate your code with version statements like:
version(LittleEndian)
{
// perform the byteswap
...
}
so your code is portable to BigEndian systems (where you would not want
to byte swap).
-Steve