[
https://issues.apache.org/jira/browse/TIKA-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029896#comment-18029896
]
ASF GitHub Bot commented on TIKA-4514:
--------------------------------------
tballison merged PR #2364:
URL: https://github.com/apache/tika/pull/2364
> RUnpackExtractor should use stream translator
> ---------------------------------------------
>
> Key: TIKA-4514
> URL: https://issues.apache.org/jira/browse/TIKA-4514
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> When recursively extracting literal bytes from files, the RUnpackExtractor
> copies the TikaInputStream (via TikaInputStream#getPath), and then processes
> that.
>
> The problem is that some file formats place an object in the TikaInputStream,
> not raw bytes. In TikaCLI, we have an example of using the
> DefaultStreamEmbeddedStreamTranslator to convert an OLE object to raw bytes.
> We should update the RUnpackExtractor to use the same pattern.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)