[
https://issues.apache.org/jira/browse/TIKA-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029911#comment-18029911
]
Hudson commented on TIKA-4514:
------------------------------
UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk17 #955 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/955/])
TIKA-4514 (#2364) (github:
[https://github.com/apache/tika/commit/2af3ae0f3e142e53c12ffcfa99cd0bcf8ccb8151])
* (edit) tika-core/src/main/java/org/apache/tika/extractor/RUnpackExtractor.java
> RUnpackExtractor should use stream translator
> ---------------------------------------------
>
> Key: TIKA-4514
> URL: https://issues.apache.org/jira/browse/TIKA-4514
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> When recursively extracting literal bytes from files, the RUnpackExtractor
> copies the TikaInputStream (via TikaInputStream#getPath), and then processes
> that.
>
> The problem is that some file formats place an object in the TikaInputStream,
> not raw bytes. In TikaCLI, we have an example of using the
> DefaultStreamEmbeddedStreamTranslator to convert an OLE object to raw bytes.
> We should update the RUnpackExtractor to use the same pattern.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)