[
https://issues.apache.org/jira/browse/TIKA-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075700#comment-16075700
]
Tim Allison commented on TIKA-2421:
-----------------------------------
Y, I was wondering about this case.
> HTML Encoding Detector should ignore UTF-16 and UTF-32
> ------------------------------------------------------
>
> Key: TIKA-2421
> URL: https://issues.apache.org/jira/browse/TIKA-2421
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.13
> Reporter: Nick C
> Priority: Minor
>
> HTMLEncodingDetector interprets the head as ASCII when parsing the meta tag
> for a possible encoding. It should ignore html pages that specify UTF-16 or
> 32 because the page obviously can't be due to the meta tag being in
> ASCII/UTF-8
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)