[
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057026#comment-17057026
]
Tim Allison commented on TIKA-2714:
-----------------------------------
Sounds good, what should we call it {{<mime-type
type="application/x-rar-compressed;version=5"}}?
> Tika Parse Errors for certain attachments
> -----------------------------------------
>
> Key: TIKA-2714
> URL: https://issues.apache.org/jira/browse/TIKA-2714
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Suman Moorthy
> Priority: Major
>
> Tika fails to parse certain attachments that our customers send to our
> application.
> We got a sample rar file from our customer that fails parsing, it only has
> simple pdf files in them and we were able to re-produce the issue.
> However. If WE create a new rar file out of the same contents (using winrar)
> and try to parse it, that succeeds.
> The rar file that our customer used is not encrypted or corrupted. Not sure
> why their rar file fails parsing, but a new rar file with same contents
> succeeds.
> Can you please provide a solution or feedback to this problem?
>
> Below is the exception thrown when we try to parse the rar file attachment
> from our customer:
>
> Aug 02, 2018 5:04:09 AM com.github.junrar.Archive setFile
> WARNING: exception in archive constructor maybe file is encrypted or currupt
> com.github.junrar.exception.RarException: badRarArchive
> at com.github.junrar.Archive.readHeaders(Archive.java:250)
> at com.github.junrar.Archive.setFile(Archive.java:136)
> at com.github.junrar.Archive.setVolume(Archive.java:581)
> at com.github.junrar.Archive.<init>(Archive.java:108)
> at com.github.junrar.Archive.<init>(Archive.java:113)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:72)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> [org.apache.tika.parser.pkg.RarParser@1372ed45|mailto:org.apache.tika.parser.pkg.RarParser@1372ed45]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> 05:04:09.488 [main] DEBUG com.actiance.platform.commons.spi.FileReaderUtils -
> Deleted Temp File - 0a44423c-6fad-47e6-943b-7b56178b0b7f.tmp
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> Caused by: java.lang.NullPointerException: mainheader is null
> at com.github.junrar.Archive.isEncrypted(Archive.java:206)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:74)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> ... 4 more
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)