On extraction, get properties AND / OR content extraction
---------------------------------------------------------
Key: TIKA-694
URL: https://issues.apache.org/jira/browse/TIKA-694
Project: Tika
Issue Type: Wish
Components: parser
Affects Versions: 0.9
Environment: All OS
Reporter: Etienne Jouvin
Priority: Minor
I use TIKA to extract properties, and only, on Office files.
The parser goes throw the document content and this is not necessary and slow
down the process.
It would be nice to have choice to extract only properties or not.
What I did was the following:
Extension of AutoDetectParser to override the parse method.
Then in the ParseContext instance, I put a flag with boolean true to say only
extract the properties.
And for example, on office file, I extended OfficeParser class. During parse
method, I check the flag, and if equals to true, I removed all the extraction
from the content.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira