Doug Ewell, Sun, 6 Jan 2013 20:57:58 -0700:
> We are pretty much going round and round on this. The bottom line for 
> me is, it would be nice if there were a shorthand way of saying 
> "big-endian UTF-16," and many people (including you?) feel that 
> "UTF-16BE" is that way, but it is not. That term has a DIFFERENT 
> MEANING. The following stream:
> 
> FE FF 00 48 00 65 00 6C 00 6C 00 6F
> 
> is valid big-endian UTF-16, but it is NOT valid "UTF-16BE" unless the 
> leading U+FEFF is explicitly meant as a zero-width no-break space, 
> which may not be stripped.

I don't remember if the RFC defines one of the 3 MIME charsets as the 
default, but given that "UTF-16" is supposed to be used whenever one 
doesn't know the endianness, then it seems logical to assume that the 
above example defaults to be treated as "UTF-16". But apart from that, 
then we can also say that the example also not valid "UTF-16", unless 
the U+FEFF is meant as a BOM …

I see the 3 as 3 MIME charsets. 

It does anyhow seem like a definition question.
-- 
leif h silli


Reply via email to