[
https://issues.apache.org/jira/browse/TIKA-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2050.
-------------------------------
Resolution: Won't Fix
If we want to increase the buffer in the future or make it configurable, we can
reopen this issue.
> HTMLEncodingDetector class fails on some HTML documents
> -------------------------------------------------------
>
> Key: TIKA-2050
> URL: https://issues.apache.org/jira/browse/TIKA-2050
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Shabanali Faghani
> Priority: Minor
> Attachments: false-negative-responce-from-HTMLEncodingDetector.zip
>
>
> When [[email protected]] and I were working on
> [TIKA-2038|https://issues.apache.org/jira/browse/TIKA-2038] I found out that
> HTMLEncodingDetector class cannot extract charsets from some HTML documents.
> I’ve attached the HTML documents that HTMLEncodingDetector fails on them. It
> seems that its regex should be corrected to cover these cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)