On Wed, Aug 28, 2013 at 3:20 PM, Axel Hecht <l...@mozilla.com> wrote:

> On 8/28/13 1:12 PM, Anne van Kesteren wrote:
>
>> On Wednesday, February 27, 2013 12:28:43 PM UTC, Axel Hecht wrote:
>>
>>> That's rather orthogonal to what you're currently trying to do, but it's
>>> also indicating to me that we should remove all of those settings from
>>> intl.properties, and just leave accept-lang, and deduce the rest.
>>>
>> So how about the parser just accepts a locale value and implements the
>> locale-to-fallback encoding map? Given the numerous problems discovered[1],
>> locale-defaults actually being part of the HTML Standard, and it being
>> available as option to change encourages people to tweak it, I think that
>> would be a better way forward.
>>
> I don't think that 'a locale value' is correct.


It's not, logically, but it's what we and other browsers currently use in
the absence of a better solution. Moving to what Anne suggested plus my
elaboration would not make us worse off compared to the status quo.


> We should use content languages and not UI language. But from the list of
> preferred content languages, we can help the parser.


I'm not at all fond of the idea of making *that* obscure piece of
configurability having parser behavior implications.

If we want to use inputs to the guessing other than the inputs we are using
today, that's a research project and not a bug fix project.  If I were
starting such a research project, I'd start by testing hypotheses about TLD
correlation with legacy encodings. The first thing I'd like to test would
be whether it would be an improvement to make builds that have Traditional
Chinese as the UI language use gbk (as opposed to big5) as the fallback
encoding when browsing content loaded from a .cn domain.


> It is a bit more tricky in general than we have right now, as for some
> users, we'll end up with mismatches between the fallback encodings. We
> could just use the first language for which we have one, though. At least
> as first step.
>

I'd rather not block solving the problem raised in this thread  on research
about how well novel inputs to the guessing process would work.


> I don't know which locale-defaults are part of the html spec, before I
> read it all, can you elaborate?
>

See the table under step 9 of
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding

-- 
Henri Sivonen
hsivo...@hsivonen.fi
http://hsivonen.iki.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to