[
https://issues.apache.org/jira/browse/TIKA-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089389#comment-13089389
]
Nick Burch commented on TIKA-694:
---------------------------------
For some parsers it may be possible to skip some parts if only metadata or text
is required, but for many parsers there wouldn't be any savings. My hunch is
that it's probably only the office type formats where there would be a big
change
If we were to do this, I think the parse context probably is the right place
for this flag.
> On extraction, get properties AND / OR content extraction
> ---------------------------------------------------------
>
> Key: TIKA-694
> URL: https://issues.apache.org/jira/browse/TIKA-694
> Project: Tika
> Issue Type: Wish
> Components: parser
> Affects Versions: 0.9
> Environment: All OS
> Reporter: Etienne Jouvin
> Priority: Minor
>
> I use TIKA to extract properties, and only, on Office files.
> The parser goes throw the document content and this is not necessary and slow
> down the process.
> It would be nice to have choice to extract only properties or not.
> What I did was the following:
> Extension of AutoDetectParser to override the parse method.
> Then in the ParseContext instance, I put a flag with boolean true to say only
> extract the properties.
> And for example, on office file, I extended OfficeParser class. During parse
> method, I check the flag, and if equals to true, I removed all the extraction
> from the content.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira