[ 
https://issues.apache.org/jira/browse/TIKA-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-1028.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.8

As of r1652866, I think we've got it working as well as we can for now. Because 
Commons Compress doesn't currently support decrypting password protected zips, 
we can't get the contents of the zip entries even with the password. However, 
we do now show the zip entry names, we don't abort, and we do manage to get the 
text of a .txt in a normal .zip in a rfc822 mail attachment

> Tika-server quits parsing of rfc-822 document prematurely when it encounters 
> encrypted zip file as attachment.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1028
>                 URL: https://issues.apache.org/jira/browse/TIKA-1028
>             Project: Tika
>          Issue Type: Bug
>          Components: mime, parser, server
>    Affects Versions: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
>            Reporter: Juha Haaga
>             Fix For: 1.8
>
>         Attachments: Document.zip, test.eml
>
>
> The Zip parser in tika-server does not allow passing in the password for 
> decrypting the zip file and doesn't handle the unsupported feature 
> gracefully. Problem happens when zip file is attached part of email document 
> being parsed, and the parser gives up and throws an exception:
> WARNING: all: Unpacker failed
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.pkg.PackageParser@10fcc945
> Caused by: 
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: 
> unsupported feature encryption used in entry
> Instead of returning the successfully parsed components, Tika-server returns 
> nothing. 
> It would be better to return rest of the parsed document contents along with 
> the untouched offending zip file in the archive that Tika-server returns as a 
> result. Until the feature of zip file decrypting is added this would always 
> return untouched zip file, and after it is implemented it should return the 
> untouched zip file in the cases where wrong password was provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to