Doug Ewell, Sun, 6 Jan 2013 20:57:58 -0700: > We are pretty much going round and round on this. The bottom line for > me is, it would be nice if there were a shorthand way of saying > "big-endian UTF-16," and many people (including you?) feel that > "UTF-16BE" is that way, but it is not. That term has a DIFFERENT > MEANING. The following stream: > > FE FF 00 48 00 65 00 6C 00 6C 00 6F > > is valid big-endian UTF-16, but it is NOT valid "UTF-16BE" unless the > leading U+FEFF is explicitly meant as a zero-width no-break space, > which may not be stripped.
I don't remember if the RFC defines one of the 3 MIME charsets as the default, but given that "UTF-16" is supposed to be used whenever one doesn't know the endianness, then it seems logical to assume that the above example defaults to be treated as "UTF-16". But apart from that, then we can also say that the example also not valid "UTF-16", unless the U+FEFF is meant as a BOM … I see the 3 as 3 MIME charsets. It does anyhow seem like a definition question. -- leif h silli