[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821772#comment-17821772
]
Hudson commented on TIKA-4204:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1533/])
TIKA-4204 -- improve lookup of dataspace/storage items (tallison:
[https://github.com/apache/tika/commit/eefe884c81a2a94c212e5ed9aa5bbb659e653782])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmCommons.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/chm/TestChmLzxState.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmPmgiHeader.java
TIKA-4204 -- improve lookup of dataspace/storage items -- fix checkstyle
(tallison:
[https://github.com/apache/tika/commit/1c1018950c88454ee9a91456931f9d18dde13124])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/chm/ChmExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/chm/TestChmLzxState.java
> ChmExtractor unable to decompress file
> --------------------------------------
>
> Key: TIKA-4204
> URL: https://issues.apache.org/jira/browse/TIKA-4204
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.9.1, 3.0.0-BETA
> Environment: The file I am trying to parse is attached, the file
> being found as the content file is "/CSS/ABBContent.css"
> Reporter: Robert Fromholz
> Assignee: Tim Allison
> Priority: Blocker
> Fix For: 2.9.2, 3.0.0
>
> Attachments: 3HAC050917_TRM_RAPID_RW_6-en.chm
>
>
> ChmExtractor fails with error: "TikaException: can't copy beyond array
> length" when calling extractChmEntry on any non-empty entry.
> Upon inspection this turns out to be caused by lzxBlockOffset being
> incorrectly set.
> This is caused by the method ChmExtractor#getIndexOfContent returing the
> wrong entry.
> This is because ChmCommons#indexOf(List, String) returns the first entry with
> a name containing the string "Content". The file I am trying to parse
> contains a file with the name Content.css, which is the entry returned by
> #indexOf(...), instead of the actual content entry.
> To fix the issue, ChmCommons#indexOf(...) should be more strict in how it
> detects the content entry.
> According to: [http://www.russotto.net/chm/chmformat.html], the name of the
> content entry will always start with "::DataSpace/Storage/", which could be
> used to restrict it to find the correct entry.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)