> On Thu, Apr 19, 2001 at 06:24:47PM -0700, Markus Scherer wrote:
> > On the other hand, if you get a file from your platform and 
> it is in 16-bit Unicode, then you would appreciate the 
> convenience of the auto-endian alias.
> 
> But nothing should be spitting out platform-endian UTF-16! In the
> case that there's a lot of unmarked big-endian UTF-16 around (as I
> understand the ISO-10646 standard recommends), then that assumption
> that everything emits unmaked platform-dependent UTF-16 will be
> wrong.

And for reference, on Windows, Unicode files are recognized because they
have a BOM. Write plain UTF-16LE w/o a BOM, and your file won't be
recognized properly. Manipulation of these files w/ ICU today is a bit
painful, since one needs to strip the BOM on input (if I understand Markus
correctly) and write a BOM at output. So these cannot be manipulated using
applications like uconv which blindly uses the raw converters.

YA

Reply via email to