Edmund GRIMLEY EVANS writes:
> So, UTF-16 gives you bigendian with BOM
No. That was the not-really-standardized official reading before RFC
2781 came out, and it was not followed by some well-known proprietary
operating system on little-endian CPUs.
Now the RFC says:
- When programs produce UTF-16, they must provide a BOM, and the
endianness is not necessarily bigendian.
- When programs interpret UTF-16 text, they must look if there is a
BOM. Only if there is a BOM they may interpret it as bigendian.
(But I wouldn't bet on this either.)
> UTF-16BE gives you big-endian without BOM and
> UTF-16LE gives you little-endian without BOM.
Yes.
> How do I ask for the machine's native ordering with or without BOM?
If you do want a BOM, use "UTF-16". If not, use UTF-16BE or UTF-16LE,
depending on autoconf's AC_C_BIGENDIAN result.
Or better, use UTF-8 instead.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/