Hi,
I want to write serialization functions for encoding and decoding
strings in binary file formats and network messages. This requires
converting the internal representations whcar_t and multibyte strings
to the various character encodings such as UCS-2LE, ISO-8859-1, UTF-8,
ASCII, etc. But it's not clear to me how this should be done correctly
and in a portable way (or at least portable enough so that when if comes
time to port I don't smack myself in the forehead).
I gather that I can only assume that wchar_t is just a sequence of UCS
codes of sizeof(wchar_t) in size. But is the in memory representation
of a multi-byte string the equivalent of the UTF-8 encoding such that
I can simply write it to a stream and read it back as UTF-8?
So as an example case, to encode wchar_t to UTF-16LE I must convert each
character to a definative encoding such as UCS-4 and then use iconv
to get to UTF-16LE. For a multi-byte string can I ue iconv on the mbs
directly to get UTF-16LE?
Another example might be to decode CP1252 I could use iconv to get to
say UCS-4 and then decode each sequence of 4 bytes to uint32_t and then
reassign it in place to correct for the byte order of the host?
Thanks,
Mike
PS: When encoding ASCII do I want to shave off the 8th bit?
--
Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/