Vikram created TIKA-1212:
----------------------------
Summary: Recursive Extraction of Archive File
Key: TIKA-1212
URL: https://issues.apache.org/jira/browse/TIKA-1212
Project: Tika
Issue Type: Bug
Reporter: Vikram
Priority: Critical
Please refer the code:
http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
Requirement:
-----------------
abc.zip
---> a.doc
----> b.xls
-----> pqr.zip
---> m.ppt
There are two issues with TIKA:
1. How to block extraction embedded doc separately optionally?
2. When I extract recussively, file name / or resourceKeyName is not coming
properly. For example
--> a.doc should have value abc.zip/a.doc. Similarily for b.xls. This is
fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This
should have value abc.zip/pqr.zip/m.ppt.
--> Even for the Embedded doc, only random name is coming.. not even with
proper file path.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)