[
https://issues.apache.org/jira/browse/TIKA-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541256#comment-17541256
]
Hudson commented on TIKA-3774:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #597 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/597/])
TIKA-3774: fix ignoreCharsets param of Icu4jEncodingDetector (lfcnassif:
[https://github.com/apache/tika/commit/768526160b3d12fc4df4671e093e101ccc44eb22])
* (add)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/resources/test-documents/test_ignore_IBM420.html
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/resources/test-configs/tika-config-ignore-charset.xml
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/main/java/org/apache/tika/parser/txt/Icu4jEncodingDetector.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/java/org/apache/tika/parser/txt/CharsetDetectorTest.java
> Fix ignoreCharsets param of Icu4jEncodingDetector
> -------------------------------------------------
>
> Key: TIKA-3774
> URL: https://issues.apache.org/jira/browse/TIKA-3774
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.0
> Reporter: Luís Filipe Nassif
> Assignee: Luís Filipe Nassif
> Priority: Minor
> Fix For: 2.4.1
>
> Attachments: test_avoid_IBM420_charset.html
>
>
> That parameter was introduced in TIKA-3516 to avoid undesired charsets in
> advance, but it is not working as expected, it is returning when first
> ignored charset is found, when it should continue to next charsets. Attached
> (corrupted) file used to be detected as windows-1252 by Tika-1.x, but now is
> being detected as IBM420 after TIKA-3516, ignoreCharsets param should be able
> to ignore IBM420. I'll push a fix shortly.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)