[ 
https://issues.apache.org/jira/browse/TIKA-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luís Filipe Nassif resolved TIKA-3774.
--------------------------------------
    Resolution: Fixed

fixed by d5b66db06598dc1aa0c1dcc9bceb9fd1e13a9c52

> Fix ignoreCharsets param of Icu4jEncodingDetector
> -------------------------------------------------
>
>                 Key: TIKA-3774
>                 URL: https://issues.apache.org/jira/browse/TIKA-3774
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.0
>            Reporter: Luís Filipe Nassif
>            Assignee: Luís Filipe Nassif
>            Priority: Minor
>             Fix For: 2.4.1
>
>         Attachments: test_avoid_IBM420_charset.html
>
>
> That parameter was introduced in TIKA-3516 to avoid undesired charsets in 
> advance, but it is not working as expected, it is returning when first 
> ignored charset is found, when it should continue to next charsets. Attached 
> (corrupted) file used to be detected as windows-1252 by Tika-1.x, but now is 
> being detected as IBM420 after TIKA-3516, ignoreCharsets param should be able 
> to ignore IBM420. I'll push a fix shortly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to