Edmund> U+FEFF is the BOM (Byte Order Mark) or ZERO WIDTH NO-BREAK
Edmund> SPACE. It can in some circumstances be useful to have this at the
Edmund> beginning of a file or datastream to distinguish big-endian UTF-16
Edmund> from little-endian UTF-16 (and from UTF-8, etc), however, it can
Edmund> also be harmful, so I don't think iconv should be generating or
Edmund> interpreting BOMs by default.
Without a BOM, there really is no way to tell if the text has been
byte-swapped. We run into this all the time with text generated on Solaris
and Linux (on a little endian machine), and depend heavily on the existence of
the BOM to make the text readable.
There are times when it just gets in the way, like applications that don't
know about the BOM.
Edmund> Should iconv perhaps have command-line arguments --bom-in and
Edmund> --bom-out or something similar?
Maybe a single command line parameter to explicitly avoid generating a BOM,
but I think one should be generated by default.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab Cinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM 88003 -- Robert Bresson
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/