Thomas Dickey wrote:


Lynx assumes the document charset is ISO-8859-1 if it's not given.
(That was the rule for some time - for HTML - perhaps we're not
discussing HTML anymore).

It hasn't been the rule for around a decade; HTML 4.0 overrides HTTP/1.1 for the text/html media type. Failure to specify a charset is an error. Browsers must not assume a default, but may use heuristics. (In practice, one of the heuristics is to use a default!)

Without a charset, therefore, it is reasonable, but not required, for a browser to assume that something that starts with a UTF-* byte order sequence is UTF-*.

Also, outside of the USA/Western Europe, it became quite common practice to use tools that set windows-1252, etc., but then actually send the local encoding. People in those regions had no problems, as they locked their browsers into, say GB2312, and ignored the charset, completely. Nowadays, there is a mix of UTF and GB2312, so that strategy may no longer work.

Setting that to UTF-8 makes it display properly.

0xFE is a valid ISO-8859-1 code, as your terminal emulator shows...



--
David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.


_______________________________________________
Lynx-dev mailing list
Lynx-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/lynx-dev

Reply via email to