On 8/30/13 12:24, Mike Hoye wrote:
On 2013-08-30 11:17 AM, Adam Roach wrote:

It seems to me that there's an important balance here between (a) letting developers discover their configuration error and (b) allowing users to render misconfigured content without specialized knowledge.

For what it's worth Internet Explorer handled this (before UTF-8 and caring about JS performance were a thing) by guessing what encoding to use, comparing a letter-frequency-analysis of a page's content to a table of what bytes are most common in which in what encodings of whatever languages.
...
From both the developer and user perspectives, it was amounted to "something went wrong because of bad magic."

I'd like to clarify two points about what I'm proposing.

First, I'm not proposing that we do anything without explicit user intervention, other than present an unobtrusive bar helping the user understand why the headline they're trying to read renders as "Ð' Ð"оÑ?дÑfме пÑEURедложили оÑ,обÑEURаÑ,ÑOE "Ð?обелÑ?" Ñf Ðz(бамÑ< " rather than "? ??????? ?????????? ???????? "??????" ? ?????". (No political statement intended here -- that's just the leading headline on Pravda at the moment).

If the user is happy with the encoding, they do nothing and go about their business.

If the user determines that the rendering is, in fact, not what they want, they can simply click on the "Yes" button and (with high probability), everything is right with the world again.

Also note that I'm not proposing that we try to do generic character set and language detection. That's fraught with the perils you cite. The topic we're discussing here is UTF-8, which can be easily detected with extremely high confidence.

--
Adam Roach
Principal Platform Engineer
a...@mozilla.com
+1 650 903 0800 x863
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to