Need API to get list of embedded documents
------------------------------------------
Key: TIKA-637
URL: https://issues.apache.org/jira/browse/TIKA-637
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
Reporter: Manish
Apache tika works great to extract the content and the meta data of documents.
but if it can have APIs where it can get you individual documents' input stream
along with its content and meta data, it would be great.
For example, if it is extracting zip files, then if we can have the output in
the form of list of <text, metadata, inputstream> for each document, or provide
an callback for each <text, metadata, inputstream>, then it can be used for
both text extraction and also to extract individual documents from container
files.
I have already done it for zip and also PST. But if we can have some standard
API, then it would be great.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira