[ 
https://issues.apache.org/jira/browse/TIKA-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075700#comment-16075700
 ] 

Tim Allison commented on TIKA-2421:
-----------------------------------

Y, I was wondering about this case.

> HTML Encoding Detector should ignore UTF-16 and UTF-32
> ------------------------------------------------------
>
>                 Key: TIKA-2421
>                 URL: https://issues.apache.org/jira/browse/TIKA-2421
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Nick C
>            Priority: Minor
>
> HTMLEncodingDetector interprets the head as ASCII when parsing the meta tag 
> for a possible encoding. It should ignore html pages that specify UTF-16 or 
> 32 because the page obviously can't be due to the meta tag being in 
> ASCII/UTF-8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to