[
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057111#comment-17057111
]
Tim Allison edited comment on TIKA-2714 at 3/11/20, 3:31 PM:
-------------------------------------------------------------
{noformat}
<mime-type type="application/x-rar-compressed">
<_comment>RAR archive</_comment>
<alias type="application/x-rar"/>
<magic priority="60">
<match value="Rar!" type="string" offset="0"/>
<match value="\x52\x61\x72\x21\x1a" type="string" offset="0"/>
</magic>
<glob pattern="*.rar"/>
</mime-type>
<mime-type type="application/x-rar-compressed;version=4">
<_comment>RAR archive</_comment>
<magic priority="50">
<match value="\x52\x61\x72\x21\x1a\x07\x00" type="string" offset="0"/>
</magic>
<sub-class-of type="application/x-rar-compressed"/>
</mime-type>
<mime-type type="application/x-rar-compressed;version=5">
<_comment>RAR archive</_comment>
<magic priority="50">
<match value="\x52\x61\x72\x21\x1a\x07\x01\x00" type="string" offset="0"/>
</magic>
<sub-class-of type="application/x-rar-compressed"/>
</mime-type>
{noformat}
was (Author: [email protected]):
{noformat}
<mime-type type="application/x-rar-compressed">
<_comment>RAR archive</_comment>
<alias type="application/x-rar"/>
<magic priority="50">
<match value="Rar!" type="string" offset="0"/>
<match value="\x52\x61\x72\x21\x1a" type="string" offset="0"/>
</magic>
<glob pattern="*.rar"/>
</mime-type>
<mime-type type="application/x-rar-compressed;version=4">
<_comment>RAR archive</_comment>
<magic priority="50">
<match value="\x52\x61\x72\x21\x1a\x07\x00" type="string" offset="0"/>
</magic>
<sub-class-of type="application/x-rar-compressed"/>
</mime-type>
<mime-type type="application/x-rar-compressed;version=5">
<_comment>RAR archive</_comment>
<magic priority="50">
<match value="\x52\x61\x72\x21\x1a\x07\x01\x00" type="string" offset="0"/>
</magic>
<sub-class-of type="application/x-rar-compressed"/>
</mime-type>
{noformat}
> Tika Parse Errors for certain attachments
> -----------------------------------------
>
> Key: TIKA-2714
> URL: https://issues.apache.org/jira/browse/TIKA-2714
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Suman Moorthy
> Priority: Major
>
> Tika fails to parse certain attachments that our customers send to our
> application.
> We got a sample rar file from our customer that fails parsing, it only has
> simple pdf files in them and we were able to re-produce the issue.
> However. If WE create a new rar file out of the same contents (using winrar)
> and try to parse it, that succeeds.
> The rar file that our customer used is not encrypted or corrupted. Not sure
> why their rar file fails parsing, but a new rar file with same contents
> succeeds.
> Can you please provide a solution or feedback to this problem?
>
> Below is the exception thrown when we try to parse the rar file attachment
> from our customer:
>
> Aug 02, 2018 5:04:09 AM com.github.junrar.Archive setFile
> WARNING: exception in archive constructor maybe file is encrypted or currupt
> com.github.junrar.exception.RarException: badRarArchive
> at com.github.junrar.Archive.readHeaders(Archive.java:250)
> at com.github.junrar.Archive.setFile(Archive.java:136)
> at com.github.junrar.Archive.setVolume(Archive.java:581)
> at com.github.junrar.Archive.<init>(Archive.java:108)
> at com.github.junrar.Archive.<init>(Archive.java:113)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:72)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> [org.apache.tika.parser.pkg.RarParser@1372ed45|mailto:org.apache.tika.parser.pkg.RarParser@1372ed45]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> 05:04:09.488 [main] DEBUG com.actiance.platform.commons.spi.FileReaderUtils -
> Deleted Temp File - 0a44423c-6fad-47e6-943b-7b56178b0b7f.tmp
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
> at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> Caused by: java.lang.NullPointerException: mainheader is null
> at com.github.junrar.Archive.isEncrypted(Archive.java:206)
> at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:74)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> ... 4 more
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)