[ 
https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram updated TIKA-1212:
-------------------------

    Description: 
Please refer the code: 
http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
Requirement:
-----------------
abc.zip
   ---> a.doc
   ---> b.xls
   ---> pqr.zip
  -------------> m.ppt
There are two issues with TIKA:
1. How to block extraction embedded doc separately optionally?
2. When I extract recussively, file name / or resourceKeyName is not coming 
properly. For example
    --> a.doc should have value  abc.zip/a.doc. Similarily for b.xls. This is 
fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This 
should have value abc.zip/pqr.zip/m.ppt.
    --> Even for the Embedded doc, only random name is coming.. not even with 
proper file path.



  was:
Please refer the code: 
http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
Requirement:
-----------------
abc.zip
   ---> a.doc
   ----> b.xls
  -----> pqr.zip
               ---> m.ppt
There are two issues with TIKA:
1. How to block extraction embedded doc separately optionally?
2. When I extract recussively, file name / or resourceKeyName is not coming 
properly. For example
    --> a.doc should have value  abc.zip/a.doc. Similarily for b.xls. This is 
fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This 
should have value abc.zip/pqr.zip/m.ppt.
    --> Even for the Embedded doc, only random name is coming.. not even with 
proper file path.




> Recursive Extraction of Archive File
> ------------------------------------
>
>                 Key: TIKA-1212
>                 URL: https://issues.apache.org/jira/browse/TIKA-1212
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Vikram
>            Priority: Critical
>
> Please refer the code: 
> http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
> Requirement:
> -----------------
> abc.zip
>    ---> a.doc
>    ---> b.xls
>    ---> pqr.zip
>   -------------> m.ppt
> There are two issues with TIKA:
> 1. How to block extraction embedded doc separately optionally?
> 2. When I extract recussively, file name / or resourceKeyName is not coming 
> properly. For example
>     --> a.doc should have value  abc.zip/a.doc. Similarily for b.xls. This is 
> fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This 
> should have value abc.zip/pqr.zip/m.ppt.
>     --> Even for the Embedded doc, only random name is coming.. not even with 
> proper file path.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to