[jira] [Commented] (TIKA-3750) Bug in sorting parsers

Tim Allison (Jira) Wed, 04 May 2022 09:21:17 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531813#comment-17531813
 ]


Tim Allison commented on TIKA-3750:
-----------------------------------

The problem is that we *do* want custom detectors to come first. So, the 
solution is not just to have the sort put custom resources last, which will fix 
the parser issue but break the detector sorting.

Should the default parser check before overwriting?  That way we keep the same 
sort orders across Parsers and Detectors and whatever else uses 
ServiceLoaderUtils.

> Bug in sorting parsers
> ----------------------
>
>                 Key: TIKA-3750
>                 URL: https://issues.apache.org/jira/browse/TIKA-3750
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Major
>
> Throughout our documentation and unit tests, we declare that parsers with a 
> different namespace than org.apache.tika should come first.  The problem is 
> that the DefaultParser iterates through the list of parsers and overwrites 
> parsers based on supported mime types.
> So, if there's a custom parser {{com.acme.parser.PDFParser}} that supports 
> {{application/pdf}}, that will be added to the map of parsers in 
> DefaultParser first and then overwritten by org.apache.tika's PDFParser.
> We should instead sort non-o.a.t. parsers last, no?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (TIKA-3750) Bug in sorting parsers

Reply via email to