David Warren created TIKA-1689:
----------------------------------
Summary: Parser sort order change in TIKA-1517 breaks parser
override capability
Key: TIKA-1689
URL: https://issues.apache.org/jira/browse/TIKA-1689
Project: Tika
Issue Type: Bug
Components: core
Affects Versions: 1.9
Reporter: David Warren
In Tika 1.9, the comparator used to sort parsers (in ServiceLoaderUtils) now
returns them in the reverse order from how they were returned in prior
versions, when the comparator was in DefaultParser. This work was done under
TIKA-1517.
This change broke one of our customizations in which we use our own parser
instead of Tika's HtmlParser to process html. We use the service loader logic
(creating our own META-INF/services/org.apache.tika.parser.Parser file) and
rely on the order in which the list returned by
DefaultParser.getDefaultParsers() is evaluated. Expecting that when Tika
builds the map of mime types to parsers it first puts in entries for
HtmlParser, then overwrites these with our custom parser.
I realize relying on this is brittle. And I found a valid workaround to the
problem in Tika 1.9 is to blacklist HtmlParser. However, in case this parser
ordering change was not intentional, I figured I'd mention it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)