Nick C created TIKA-2421:
----------------------------

             Summary: HTML Encoding Detector should ignore UTF-16 and UTF-32
                 Key: TIKA-2421
                 URL: https://issues.apache.org/jira/browse/TIKA-2421
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.13
            Reporter: Nick C
            Priority: Minor


HTMLEncodingDetector interprets the head as ASCII when parsing the meta tag for 
a possible encoding. It should ignore html pages that specify UTF-16 or 32 
because the page obviously can't be due to the meta tag being in ASCII/UTF-8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to