[ 
https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338538#comment-14338538
 ] 

Tim Allison commented on TIKA-1509:
-----------------------------------

To confirm I understand, is the goal/use case of this recommendation: there is 
an actual file behind the InputStream or the user knows that the InputStream is 
short enough to be buffered and there is no need to create temp files?  In 
other words, this proposal offers an efficiency versus completely new behavior?

I was thinking the more challenging part was how to reset the handlers in the 
cases where we wouldn't want agglomeration of results.  In the fallback case in 
Nick's code, what happens if someone creates a ContentHandler that writes to an 
OutputStream, and the first parser writes something to the OutputStream before 
failing.  Would we want to create a TikaOutputStream that writes the output to 
a temp file.  Or, to get started, we could require that the ParserDecorator 
only take a {{ResettableContentHandler}}?  Or, we require that the user send in 
a ContentHandlerFactory that will create a new ContentHandler for each attempt 
at parsing, argh, but it would also have to be called to flush/close its 
generated ContentHandlers on exception, which is effectively {{reset()}}?

> Create configurable strategies for composite parsers
> ----------------------------------------------------
>
>                 Key: TIKA-1509
>                 URL: https://issues.apache.org/jira/browse/TIKA-1509
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> Several parsers can handle the same mime type, and we are currently ordering 
> which parser is chosen (roughly) by the alphabetic order of the parser class 
> name.
> Let's allow users to configure strategies for picking parsers.
> See and contribute to full discussion here: 
> http://wiki.apache.org/tika/CompositeParserDiscussion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to