Nick C created TIKA-2421:
----------------------------
Summary: HTML Encoding Detector should ignore UTF-16 and UTF-32
Key: TIKA-2421
URL: https://issues.apache.org/jira/browse/TIKA-2421
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.13
Reporter: Nick C
Priority: Minor
HTMLEncodingDetector interprets the head as ASCII when parsing the meta tag for
a possible encoding. It should ignore html pages that specify UTF-16 or 32
because the page obviously can't be due to the meta tag being in ASCII/UTF-8
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)