Hans Brende created ANY23-411:
---------------------------------
Summary: Use Content-Type to help determine encoding
Key: ANY23-411
URL: https://issues.apache.org/jira/browse/ANY23-411
Project: Apache Any23
Issue Type: Bug
Components: encoding
Affects Versions: 2.3
Reporter: Hans Brende
Assignee: Hans Brende
Fix For: 2.3
Incredibly enough, it seems that our encoding detector does not take the
Content-Type header into account at all when trying to guess a document's
charset encoding!
This has caused a problem for me with the page:
http://w3c.github.io/microdata-rdf/tests/0065.html
Even though the Content-Type header is set to "text/html; charset=utf-8", we're
guessing the charset to be: "IBM500", which in turn renders the page into
complete gibberish.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)