[jira] [Updated] (TIKA-1212) Recursive Extraction of Archive File

Nick Burch (JIRA) Wed, 04 Jun 2014 08:12:33 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nick Burch updated TIKA-1212:
-----------------------------

    Attachment: RecursiveParsingExample.java

This is why we should have our examples pulled from svn, where we can check the 
compile and run... (See dev@ posts about this)

I've fixed the logic on the wiki, and attached is an example program which lets 
you pick between the two different wiki based examples, which we may want to 
put into svn under 
tika-examples/src/main/java/org/apache/tika/examples/RecursiveParsingExample.java
 at a later date. It does seem to work correctly now

> Recursive Extraction of Archive File
> ------------------------------------
>
>                 Key: TIKA-1212
>                 URL: https://issues.apache.org/jira/browse/TIKA-1212
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Vikram
>            Priority: Critical
>         Attachments: RECURSIVE_PARSER_WRAPPER_HACK.patch, 
> RecursiveMetadataParserZukka.java, RecursiveParsingExample.java, 
> TIKA-Output.xlsx, abc.zip, abc.zip, test_recursive_embedded.docx
>
>
> Please refer the code: 
> http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
> Requirement:
> -----------------
> abc.zip
>    ---> a.doc
>    ---> b.xls
>    ---> pqr.zip
>   -------------> m.ppt
> There are two issues with TIKA:
> 1. How to block extraction embedded doc separately optionally?
> 2. When I extract recussively, file name / or resourceKeyName is not coming 
> properly. For example
>     --> a.doc should have value  abc.zip/a.doc. Similarily for b.xls. This is 
> fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This 
> should have value abc.zip/pqr.zip/m.ppt.
>     --> Even for the Embedded doc, only random name is coming.. not even with 
> proper file path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (TIKA-1212) Recursive Extraction of Archive File

Reply via email to