[
https://issues.apache.org/jira/browse/TIKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279572#comment-14279572
]
Nick Burch commented on TIKA-1028:
----------------------------------
I've added a temporary workaround in r1652317. It doesn't seem quite right
though - it feels like we should be giving the calling code more control over
skip/log/fail for inner attachment parsing
In r1652318 I've added a partial unit test for this case. It would be good if
someone could confirm the password used on that test file, then we can expand
that unit test for a proper negative test (text not found without the password,
but remainder is), and positive test (when a password provider is given, the
zip's contents are found too)
Could some ([~fuu]?) perhaps confirm the test file's encryption password to
help with that?
> Tika-server quits parsing of rfc-822 document prematurely when it encounters
> encrypted zip file as attachment.
> --------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-1028
> URL: https://issues.apache.org/jira/browse/TIKA-1028
> Project: Tika
> Issue Type: Bug
> Components: mime, parser, server
> Affects Versions: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
> Reporter: Juha Haaga
> Attachments: encrypted-zip.msg
>
>
> The Zip parser in tika-server does not allow passing in the password for
> decrypting the zip file and doesn't handle the unsupported feature
> gracefully. Problem happens when zip file is attached part of email document
> being parsed, and the parser gives up and throws an exception:
> WARNING: all: Unpacker failed
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
> org.apache.tika.parser.pkg.PackageParser@10fcc945
> Caused by:
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
> unsupported feature encryption used in entry
> Instead of returning the successfully parsed components, Tika-server returns
> nothing.
> It would be better to return rest of the parsed document contents along with
> the untouched offending zip file in the archive that Tika-server returns as a
> result. Until the feature of zip file decrypting is added this would always
> return untouched zip file, and after it is implemented it should return the
> untouched zip file in the cases where wrong password was provided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)