[ 
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057111#comment-17057111
 ] 

Tim Allison commented on TIKA-2714:
-----------------------------------

{noformat}
  <mime-type type="application/x-rar-compressed">
    <_comment>RAR archive</_comment>
    <alias type="application/x-rar"/>
    <magic priority="50">
      <match value="Rar!" type="string" offset="0"/>
      <match value="\x52\x61\x72\x21\x1a" type="string" offset="0"/>
    </magic>
    <glob pattern="*.rar"/>
  </mime-type>
  <mime-type type="application/x-rar-compressed;version=4">
    <_comment>RAR archive</_comment>
    <magic priority="50">
      <match value="\x52\x61\x72\x21\x1a\x07\x00" type="string" offset="0"/>
    </magic>
    <sub-class-of type="application/x-rar-compressed"/>
  </mime-type>
  <mime-type type="application/x-rar-compressed;version=5">
    <_comment>RAR archive</_comment>
    <magic priority="50">
      <match value="\x52\x61\x72\x21\x1a\x07\x01\x00" type="string" offset="0"/>
    </magic>
    <sub-class-of type="application/x-rar-compressed"/>
  </mime-type>
{noformat}

> Tika Parse Errors for certain attachments
> -----------------------------------------
>
>                 Key: TIKA-2714
>                 URL: https://issues.apache.org/jira/browse/TIKA-2714
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Suman Moorthy
>            Priority: Major
>
> Tika fails to parse certain attachments that our customers send to our 
> application.
> We got a sample rar file from our customer that fails parsing, it only has 
> simple pdf files in them  and we were able to re-produce the issue.
> However. If WE create a new rar file out of the same contents (using winrar) 
> and try to parse it, that succeeds. 
> The rar file that our customer used is not encrypted or corrupted. Not sure 
> why their rar file fails parsing, but a new rar file with same contents 
> succeeds.
> Can you please provide a solution or feedback to this problem?
>  
> Below is the exception thrown when we try to parse the rar file attachment 
> from our customer:
>  
> Aug 02, 2018 5:04:09 AM com.github.junrar.Archive setFile
> WARNING: exception in archive constructor maybe file is encrypted or currupt
> com.github.junrar.exception.RarException: badRarArchive
>      at com.github.junrar.Archive.readHeaders(Archive.java:250)
>      at com.github.junrar.Archive.setFile(Archive.java:136)
>      at com.github.junrar.Archive.setVolume(Archive.java:581)
>      at com.github.junrar.Archive.<init>(Archive.java:108)
>      at com.github.junrar.Archive.<init>(Archive.java:113)
>      at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:72)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      at 
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
>      at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> [org.apache.tika.parser.pkg.RarParser@1372ed45|mailto:org.apache.tika.parser.pkg.RarParser@1372ed45]
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> 05:04:09.488 [main] DEBUG com.actiance.platform.commons.spi.FileReaderUtils - 
> Deleted Temp File - 0a44423c-6fad-47e6-943b-7b56178b0b7f.tmp
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>      at 
> com.actiance.platform.sfab.cis.etl.documentProcessor.internal.DocumentProcessorImpl.getExtractedContent(DocumentProcessorImpl.java:160)
>      at test.TikaParserAPIExample.main(TikaParserAPIExample.java:31)
> Caused by: java.lang.NullPointerException: mainheader is null
>      at com.github.junrar.Archive.isEncrypted(Archive.java:206)
>      at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:74)
>      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>      ... 4 more
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to