Russ Allbery <r...@debian.org> writes:

> I did a bit more research, and apparently this approach has become more
> blessed again.  I'm glad I looked it up!  As of Unicode 5.0, the
> standard explicitly recommended against doing this, but the latest
> version of the standard is moderately positive about it (although
> doesn't require it):

>     In UTF-8, the BOM corresponds to the byte sequence <EF16 BB16
>     BF16>. Although there are never any questions of byte order with UTF-8
>     text, this sequence can serve as signature for UTF-8 encoded text
>     where the character set is unmarked.

> (although it does strongly discourage it if there's any other signaling
> method available).

Okay, I experimented with this, but unfortunately less displays the BOM at
the start of the file as a very ugly reverse-video <U+FEFF> at the top of
the screen.

I think this is arguably a bug in less; this is a control character in a
sense, but the whole point is for it to be invisible, particularly when
it's the first character of the file.  Nonetheless, that's how less
currently behaves.  My feeling is that good display in less is a more
important use case for us than enabling this autorecognition in web
browsers (which will normally be viewing the HTML versions).

Given that, I think the right fix here is to fix the declared charset on
www.debian.org for these files.

-- 
Russ Allbery (r...@debian.org)               <http://www.eyrie.org/~eagle/>

Reply via email to