Re: UTF-8 BOM, really!? (was: [Haskell-cafe] Re: File path programme)

Scott Turner Mon, 31 Jan 2005 07:31:12 -0800

On 2005 January 31 Monday 04:56, Graham Klyne wrote:
> How can it make sense to have a BOM in UTF-8?  UTF-8 is a sequence of
> octets (bytes);  what ordering is there here that can sensibly be varied?


Correct. There is no order to be varied.

A BOM came to be permitted because it uses the identical code as NBSP 
(non-breaking space). Earlier versions of Unicode permit NBSP just about 
anywhere in the character sequence.  Unicode 4 deprecates this use of NBSP.

If I read it correctly, Unicode 4 says that a BOM at the beginning of a UTF-8 
encoded stream is not to be taken as part of the text. The BOM has no effect. 
The rationale for this is that some applications put out a BOM at the 
beginning of the output regardless of the encoding.  Other occurrences of 
NBSP in a UTF-8 encoded stream are significant.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: UTF-8 BOM, really!? (was: [Haskell-cafe] Re: File path programme)

Reply via email to