[
https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523768#comment-16523768
]
Lee Carpenter commented on TIKA-1509:
-------------------------------------
To Luis' stream reset point I have a perfect test case. An Excel Macro file
application/vnd.ms-excel.template.macroenabled.12 would use the
org.apache.tika.parser.microsoft.ooxml.OOXMLParser parser, but if you look at
the stream the "Magic" matches application/msexcel which would use the
org.apache.tika.parser.microsoft.OfficeParser.
This was an email attachment and I was able to handle a "Fallback" but that
requires re-sending the whole document back over to be parsed. So I would be
interested in being able to implement a fallback parser. The one snag is that
the "Magic" is the same for a number of different MS Office files, so they rely
upon a valid content type or file extension.
Just some thoughts
> Create configurable strategies for composite parsers
> ----------------------------------------------------
>
> Key: TIKA-1509
> URL: https://issues.apache.org/jira/browse/TIKA-1509
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Priority: Major
>
> Several parsers can handle the same mime type, and we are currently ordering
> which parser is chosen (roughly) by the alphabetic order of the parser class
> name.
> Let's allow users to configure strategies for picking parsers.
> See and contribute to full discussion here:
> http://wiki.apache.org/tika/CompositeParserDiscussion
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)