[
https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563250#comment-17563250
]
Tim Allison commented on TIKA-3812:
-----------------------------------
I think the above behavior is actually an improvement in 2.4.1. If you have
{{tika-parser-scientific-package}} on your class path, I think you'd want that
to run instead of the ImageParser and Tesseract, no? Or, are you interested in
other parsers in the scientific-package and do not want GDAL?
Further, if you have exiftool installed, that is now getting triggered on mp4,
which should be the desired behavior.
What do you think?
> Parser Order: image get parsed by GDALParser instead of TesseractOCRParser
> --------------------------------------------------------------------------
>
> Key: TIKA-3812
> URL: https://issues.apache.org/jira/browse/TIKA-3812
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.1
> Reporter: Eugen Caruntu
> Priority: Minor
> Fix For: 2.4.2
>
> Attachments: parser-diffs.tgz
>
>
> The selected parser seems to be different in 2.4.1. For example sending an
> image (jpg/png) that was previously (2.4.0) processed by TesseractOCRParser,
> now gets parsed by GDALParser.
> Seems that when multiple parsers support same file types, the selected parser
> depends on the order in which they get loaded.
> For example the GDALParser, ImageParser and TesseractOCRParser all support
> image/jpeg, image/png, image/gif ...
> A recent change is reversing the parser order (TIKA-3750).
> Re-configuring the GDALParser by excluding the image mime types might work,
> but there could be other duplicated parsers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)