[jira] [Commented] (TIKA-694) On extraction, get properties AND / OR content extraction

Nick Burch (JIRA) Tue, 23 Aug 2011 04:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089389#comment-13089389
 ]


Nick Burch commented on TIKA-694:
---------------------------------

For some parsers it may be possible to skip some parts if only metadata or text 
is required, but for many parsers there wouldn't be any savings. My hunch is 
that it's probably only the office type formats where there would be a big 
change

If we were to do this, I think the parse context probably is the right place 
for this flag.

> On extraction, get properties AND / OR content extraction
> ---------------------------------------------------------
>
>                 Key: TIKA-694
>                 URL: https://issues.apache.org/jira/browse/TIKA-694
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 0.9
>         Environment: All OS
>            Reporter: Etienne Jouvin
>            Priority: Minor
>
> I use TIKA to extract properties, and only, on Office files.
> The parser goes throw the document content and this is not necessary and slow 
> down the process.
> It would be nice to have choice to extract only properties or not.
> What I did was the following:
> Extension of AutoDetectParser to override the parse method.
> Then in the ParseContext instance, I put a flag with boolean true to say only 
> extract the properties.
> And for example, on office file, I extended OfficeParser class. During parse 
> method, I check the flag, and if equals to true, I removed all the extraction 
> from the content.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-694) On extraction, get properties AND / OR content extraction

Reply via email to