[
https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338538#comment-14338538
]
Tim Allison commented on TIKA-1509:
-----------------------------------
To confirm I understand, is the goal/use case of this recommendation: there is
an actual file behind the InputStream or the user knows that the InputStream is
short enough to be buffered and there is no need to create temp files? In
other words, this proposal offers an efficiency versus completely new behavior?
I was thinking the more challenging part was how to reset the handlers in the
cases where we wouldn't want agglomeration of results. In the fallback case in
Nick's code, what happens if someone creates a ContentHandler that writes to an
OutputStream, and the first parser writes something to the OutputStream before
failing. Would we want to create a TikaOutputStream that writes the output to
a temp file. Or, to get started, we could require that the ParserDecorator
only take a {{ResettableContentHandler}}? Or, we require that the user send in
a ContentHandlerFactory that will create a new ContentHandler for each attempt
at parsing, argh, but it would also have to be called to flush/close its
generated ContentHandlers on exception, which is effectively {{reset()}}?
> Create configurable strategies for composite parsers
> ----------------------------------------------------
>
> Key: TIKA-1509
> URL: https://issues.apache.org/jira/browse/TIKA-1509
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
>
> Several parsers can handle the same mime type, and we are currently ordering
> which parser is chosen (roughly) by the alphabetic order of the parser class
> name.
> Let's allow users to configure strategies for picking parsers.
> See and contribute to full discussion here:
> http://wiki.apache.org/tika/CompositeParserDiscussion
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)