Tim Allison created TIKA-3800:
---------------------------------

             Summary: Consider wrapping 'unrar' commandline executable as a 
parser to handle rar v5
                 Key: TIKA-3800
                 URL: https://issues.apache.org/jira/browse/TIKA-3800
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


Junrar is great and doesn't require any external dependencies.  However, it 
doesn't handle rar v5.  I've tried {{UNRAR 5.61 beta 1 freeware}} on some of 
the v5 files that we have in our regression corpus, and I can confirm that Tika 
is not able to handle them, but unrar is.

The parser would need to create a temporary directory, copy the inputstream 
there to a file, run unrar, process the extracted files and then clean up the 
directory.

We can get full path information from the {{l}} command: {{unrar l blah.rar}}

We can tell unrar not to overwrite files with the same name: {{unrar e or 
bug_trackers/LIBRE_OFFICE/131138-137877/LIBRE_OFFICE-135119-0.rar}}.





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to