[
https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-1212:
-----------------------------
Attachment: RecursiveParsingExample.java
This is why we should have our examples pulled from svn, where we can check the
compile and run... (See dev@ posts about this)
I've fixed the logic on the wiki, and attached is an example program which lets
you pick between the two different wiki based examples, which we may want to
put into svn under
tika-examples/src/main/java/org/apache/tika/examples/RecursiveParsingExample.java
at a later date. It does seem to work correctly now
> Recursive Extraction of Archive File
> ------------------------------------
>
> Key: TIKA-1212
> URL: https://issues.apache.org/jira/browse/TIKA-1212
> Project: Tika
> Issue Type: Bug
> Reporter: Vikram
> Priority: Critical
> Attachments: RECURSIVE_PARSER_WRAPPER_HACK.patch,
> RecursiveMetadataParserZukka.java, RecursiveParsingExample.java,
> TIKA-Output.xlsx, abc.zip, abc.zip, test_recursive_embedded.docx
>
>
> Please refer the code:
> http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
> Requirement:
> -----------------
> abc.zip
> ---> a.doc
> ---> b.xls
> ---> pqr.zip
> -------------> m.ppt
> There are two issues with TIKA:
> 1. How to block extraction embedded doc separately optionally?
> 2. When I extract recussively, file name / or resourceKeyName is not coming
> properly. For example
> --> a.doc should have value abc.zip/a.doc. Similarily for b.xls. This is
> fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This
> should have value abc.zip/pqr.zip/m.ppt.
> --> Even for the Embedded doc, only random name is coming.. not even with
> proper file path.
--
This message was sent by Atlassian JIRA
(v6.2#6252)