On 2005 January 31 Monday 04:56, Graham Klyne wrote: > How can it make sense to have a BOM in UTF-8? UTF-8 is a sequence of > octets (bytes); what ordering is there here that can sensibly be varied?
Correct. There is no order to be varied. A BOM came to be permitted because it uses the identical code as NBSP (non-breaking space). Earlier versions of Unicode permit NBSP just about anywhere in the character sequence. Unicode 4 deprecates this use of NBSP. If I read it correctly, Unicode 4 says that a BOM at the beginning of a UTF-8 encoded stream is not to be taken as part of the text. The BOM has no effect. The rationale for this is that some applications put out a BOM at the beginning of the output regardless of the encoding. Other occurrences of NBSP in a UTF-8 encoded stream are significant. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe