Make Option to Exclude Embedded Files' Text for Text Content
------------------------------------------------------------
Key: TIKA-819
URL: https://issues.apache.org/jira/browse/TIKA-819
Project: Tika
Issue Type: New Feature
Components: general
Affects Versions: 1.0
Environment: Windows-7 + JDK 1.6 u26
Reporter: Albert L.
Fix For: 1.1
It would be nice to be able to disable text content from embedded files.
For example, if I have a DOCX with an embedded PPTX, then I would like the
option to disable text from the PPTX from showing up when asking for the text
content from DOCX. In other words, it would be nice to have the option to get
text content *only* from the DOCX instead of the DOCX+PPTX.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira