[
https://issues.apache.org/jira/browse/TIKA-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif resolved TIKA-3774.
--------------------------------------
Resolution: Fixed
fixed by d5b66db06598dc1aa0c1dcc9bceb9fd1e13a9c52
> Fix ignoreCharsets param of Icu4jEncodingDetector
> -------------------------------------------------
>
> Key: TIKA-3774
> URL: https://issues.apache.org/jira/browse/TIKA-3774
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.0
> Reporter: Luís Filipe Nassif
> Assignee: Luís Filipe Nassif
> Priority: Minor
> Fix For: 2.4.1
>
> Attachments: test_avoid_IBM420_charset.html
>
>
> That parameter was introduced in TIKA-3516 to avoid undesired charsets in
> advance, but it is not working as expected, it is returning when first
> ignored charset is found, when it should continue to next charsets. Attached
> (corrupted) file used to be detected as windows-1252 by Tika-1.x, but now is
> being detected as IBM420 after TIKA-3516, ignoreCharsets param should be able
> to ignore IBM420. I'll push a fix shortly.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)