[ 
https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340662#comment-14340662
 ] 

Tyler Palsulich commented on TIKA-1509:
---------------------------------------

Just to reiterate the above and be clear about the issues we're running into 
with this, here is a list. Please correct/update if I'm misunderstanding or 
leaving something out.

# Multiple Parsers may support any given file. So, users should be able to 
provide a strategy of which Parser is used or how Parser results are merged.
# The default behavior when multiple Parsers support a file will be:
## Pick an initial Parser with _some strategy_. If it fails, keep trying 
additional Parsers.
## Run all Parsers and merge results.
# If you're trying multiple Parsers, how do you/should you merge the Metadata?
# If you're trying multiple Parsers, how do you/should you merge 
ContentHandler? A ContentHandler is fed information from the Parser while 
consuming the input stream. Possible answers:
## Make ContentHandlers have a reset() functionality -- drop all previously 
passed content.
## Make users pass in a ContentHandlerFactory, so the Parsers can create a new 
ContentHandler when they start Parsing. This is essentially a reset in the form 
of creating a new ContentHandler.
# How do you reset the given InputStream when starting a new parse?
# How do container aware Parsers factor into this?

> Create configurable strategies for composite parsers
> ----------------------------------------------------
>
>                 Key: TIKA-1509
>                 URL: https://issues.apache.org/jira/browse/TIKA-1509
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> Several parsers can handle the same mime type, and we are currently ordering 
> which parser is chosen (roughly) by the alphabetic order of the parser class 
> name.
> Let's allow users to configure strategies for picking parsers.
> See and contribute to full discussion here: 
> http://wiki.apache.org/tika/CompositeParserDiscussion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to