Gerard Bouchar created TIKA-2673:
------------------------------------
Summary: HtmlEncodingDetector doesn't follow the specification
Key: TIKA-2673
URL: https://issues.apache.org/jira/browse/TIKA-2673
Project: Tika
Issue Type: Bug
Reporter: Gerard Bouchar
Attachments: HtmlEncodingDetectorTest.java
This bug is linked to TIKA-2671, but does not concern metadata, but rather the
bytes-based detection itself.
While reading the specification, I collected a list of sample cases where
HtmlEncodingDetector differs from the specification, and thus fails at
detecting the right charset.
I am attaching the test cases to this issue:
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)