Hi Nicholas,

 

UTF-8 datastreams can contain a BOM. However, UTF-8 is byte oriented and
always has the same byte order. A BOM can be used as a signature, but it
will make no difference to the endianness of the bytestream. I agree
with you that it may be helpful to some applications to identify the
encoding form.

 

The danger is though; some recipients of UTF-8 encoded data do not
expect a BOM. Especially if UTF-8 is used in 8-bit environments, the use
of a BOM will interfere with any protocol or file format that expects
specific ASCII characters at the beginning, such as the use of "#!" of
at the beginning of Unix shell scripts.

 

Cheers,

 

 

-Ozgur Sahoglu

 

________________________________

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 18, 2008 7:38 AM
To: [email protected]
Subject: BOM for UTF-8

 

Hello,

 

Has there been any discussion or thought to adding a BOM to a UTF-8
serialized file if the developer specifically set the BOM feature?  By
default, this should not exist, but the BOM is pretty useful for certain
editors to correctly identify the underlying encoding if they are not
parsing the first line.

 

Thanks,

Nicholas Thayer

Reply via email to