[ 
https://issues.apache.org/jira/browse/TIKA-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640350#comment-14640350
 ] 

Tim Allison commented on TIKA-1689:
-----------------------------------

To confirm [~dwarren]'s point and to supplement with detail...it turns out this 
change happened in some cleanup to TIKA-1517 
([r1677328|http://svn.apache.org/viewvc?view=revision&revision=1677328]).  
Before that patch we had in DefaultParser:

{noformat}
            public int compare(Parser p1, Parser p2) {
                String n1 = p1.getClass().getName();
                String n2 = p2.getClass().getName();
                boolean t1 = n1.startsWith("org.apache.tika.");
                boolean t2 = n2.startsWith("org.apache.tika.");
                if (t1 == t2) {
                    return n1.compareTo(n2);
                } else if (t1) {
                    return -1;
                } else {
                    return 1;
                }
            }
{noformat}

When that was pulled into ServiceLoaderUtils, it became:
{noformat}
             public int compare(T c1, T c2) {
 
                 String n1 = c1.getClass().getName(); 
                 String n2 = c2.getClass().getName();
 
                 boolean t1 = n1.startsWith("org.apache.tika."); 
                 boolean t2 = n2.startsWith("org.apache.tika."); 
                 if (t1 == t2) { 
                     return n1.compareTo(n2); 
                 } else if (t1) {
                     return 1; 
                 } else {
                     return -1; 
                }
              } 
{noformat}

[~gagravarr], can you think of any problems if I flip the -1/1 back to where 
they were?

> Parser sort order change in TIKA-1517 breaks parser override capability
> -----------------------------------------------------------------------
>
>                 Key: TIKA-1689
>                 URL: https://issues.apache.org/jira/browse/TIKA-1689
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.9
>            Reporter: David Warren
>            Priority: Blocker
>
> In Tika 1.9, the comparator used to sort parsers (in ServiceLoaderUtils) now 
> returns them in the reverse order from how they were returned in prior 
> versions, when the comparator was in DefaultParser.  This work was done under 
> TIKA-1517.
> This change broke one of our customizations in which we use our own parser 
> instead of Tika's HtmlParser to process html.  We use the service loader 
> logic (creating our own META-INF/services/org.apache.tika.parser.Parser file) 
> and rely on the order in which the list returned by 
> DefaultParser.getDefaultParsers() is evaluated.    Expecting that when Tika 
> builds the map of mime types to parsers it first puts in entries for 
> HtmlParser, then overwrites these with our custom parser.  
> I realize relying on this is brittle.  And I found a valid workaround to the 
> problem in Tika 1.9 is to blacklist HtmlParser.  However, in case this parser 
> ordering change was not intentional, I figured I'd mention it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to