Re: UTF-16 endianess

Steven Schveighoffer via Digitalmars-d-learn Fri, 29 Jan 2016 16:01:07 -0800

On 1/29/16 6:03 PM, Marek Janukowicz wrote:

On Fri, 29 Jan 2016 17:43:26 -0500, Steven Schveighoffer wrote:

Is there anything I should know about UTF endianess?


It's not any different from other endianness.

In other words, a UTF16 code unit is expected to be in the endianness of
the platform you are running on.

If you are on x86 or x86_64 (very likely), then it should be little endian.

If your source of data is big-endian (or opposite from your native
endianness),


To be precise - my case is IMAP UTF7 folder name encoding and I finally found
out it's indeed big endian, which explains my problem (as I'm indeed on x86_64).

it will have to be converted before treating as a wchar[].


Is there any clever way to do the conversion? Or do I need to swap the bytes
manually?


No clever way, just the straightforward way ;)

Swapping endianness of 32-bits can be done with core.bitop.bswap. Doingit with 16 bits I believe you have to do bit shifting. Something like:

foreach(ref elem; wcharArr) elem = ((elem << 8) & 0xff00) | ((elem >> 8)& 0x00ff);


Or you can do it with the bytes directly before casting

Note the version identifiers BigEndian and LittleEndian can be used to
compile the correct code.


This solution is of no use to me as I don't want to change the endianess in
general.


What I mean is that you can annotate your code with version statements like:

version(LittleEndian)
{
   // perform the byteswap
   ...
}

so your code is portable to BigEndian systems (where you would not wantto byte swap).


-Steve

Re: UTF-16 endianess

Reply via email to