[ 
https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566943#comment-17566943
 ] 

Tilman Hausherr commented on TIKA-3812:
---------------------------------------

Build fails on my machine (W10):
{noformat}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.979 s 
<<< FAILURE! - in org.apache.tika.parser.scientific.integration.TestParsers
[ERROR] 
org.apache.tika.parser.scientific.integration.TestParsers.testDiffsFrom241  
Time elapsed: 0.951 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <class 
org.apache.tika.parser.mp4.MP4Parser> but was: <class 
org.apache.tika.parser.external.CompositeExternalParser>
        at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
        at 
org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
        at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
        at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
        at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
        at 
org.apache.tika.parser.scientific.integration.TestParsers.testDiffsFrom241(TestParsers.java:66)
 {noformat}

> Parser Order: image get parsed by GDALParser instead of TesseractOCRParser
> --------------------------------------------------------------------------
>
>                 Key: TIKA-3812
>                 URL: https://issues.apache.org/jira/browse/TIKA-3812
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.1
>            Reporter: Eugen Caruntu
>            Priority: Minor
>         Attachments: parser-diffs.tgz
>
>
> The selected parser seems to be different in 2.4.1. For example sending an 
> image (jpg/png) that was previously (2.4.0) processed by TesseractOCRParser, 
> now gets parsed by GDALParser.
> Seems that when multiple parsers support same file types, the selected parser 
> depends on the order in which they get loaded.
> For example the GDALParser, ImageParser and TesseractOCRParser all support 
> image/jpeg, image/png, image/gif ...
> A recent change is reversing the parser order (TIKA-3750).
> Re-configuring the GDALParser by excluding the image mime types might work, 
> but there could be other duplicated parsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to