Gerard Bouchar created TIKA-2673: ------------------------------------ Summary: HtmlEncodingDetector doesn't follow the specification Key: TIKA-2673 URL: https://issues.apache.org/jira/browse/TIKA-2673 Project: Tika Issue Type: Bug Reporter: Gerard Bouchar Attachments: HtmlEncodingDetectorTest.java
This bug is linked to TIKA-2671, but does not concern metadata, but rather the bytes-based detection itself. While reading the specification, I collected a list of sample cases where HtmlEncodingDetector differs from the specification, and thus fails at detecting the right charset. I am attaching the test cases to this issue: -- This message was sent by Atlassian JIRA (v7.6.3#76005)