Pavel Arnošt created TIKA-2818:
----------------------------------

             Summary: RarParser throws EncryptedDocumentException only when 
whole archiveis encrypted
                 Key: TIKA-2818
                 URL: https://issues.apache.org/jira/browse/TIKA-2818
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.20
            Reporter: Pavel Arnošt
         Attachments: rar4_encrypted_content_only.rar

RarParser throws EncryptedDocumentException only if whole archive is encrypted. 
If encryption is on individial files, parser ends with 
org.apache.tika.exception.TikaException: RarParser Exception:

Caused by: org.apache.tika.exception.TikaException: RarParser Exception
 at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:99)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
 at ... 43 more
Caused by: com.github.junrar.exception.RarException: ioError
 at com.github.junrar.Archive.getInputStream(Archive.java:525)
 at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:81)
 ... 48 more
Caused by: com.github.junrar.exception.RarException: crcError
 at com.github.junrar.Archive.doExtractFile(Archive.java:557)
 at com.github.junrar.Archive.extractFile(Archive.java:498)
 at com.github.junrar.Archive.getInputStream(Archive.java:523)
 ... 49 more

File encryption should be checked before trying to extract content on line 79 
like this:

FileHeader header = rar.nextFileHeader();

if (header.isEncrypted()) {
    throw new EncryptedDocumentException();
}

while (header != null && !Thread.currentThread().isInterrupted()) {

Or maybe insert it into metadata with 
TikaCoreProperties.TIKA_META_EXCEPTION_EMBEDDED_STREAM key? I don't know, but 
current behaviour is not correct (parsing fails).

Sample document is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to