On 06/09/13 16:34, Gervase Markham wrote:
Data! Sounds like a plan.
Or we could ask our friends at Google or some other search engine to run
a version of our detector over their index and see how often it says
"UTF-8" when our normal algorithm would say something else.
Gerv
This website has an interesting, and apparently up-to-date set of
statistics:
http://w3techs.com/technologies/overview/character_encoding/all
Their current top ten encodings, as of today, are:
UTF-8: 76.7%
ISO-8859-1: 11.7%
Windows-1251 (Cyrillic): 2.9%
GB2312 (Chinese): 2.5%
Shift JIS (Japanese): 1.5%
Windows-1252 (superset of ISO-8859-1): 1.4%
GBK (Chinese): 0.7%
ISO-8859-2 (Eastern Europe, Latin script): 0.4%
EUC-JP (Japanese): 0.4%
Windows-1256 (Arabic): 0.4%
Although the exact interpretation of these results is tricky, since they
don't give their criteria for exactly how they define and detect these
decodings, if their results are even approximately right, it's pretty
clear that UTF-8 now dominates the web as the single commonest
charset/encoding by far.
-- N.
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform