[
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267553#comment-14267553
]
Nick Burch commented on TIKA-1445:
----------------------------------
I wonder if it wouldn't be better to do the "is tessaract there" check in the
`getSupportedTypes` method? That way, if tessaract can't be found, then the
main composite parser (eg AutoDetectParser, if being used) would just skip over
the Tessarct one, and fall back to the Jpeg or Image one as appropriate
We could then do an additional check at parse time, in case of a direct call to
the parser.
I'll have a go at working that up shortly
Oh, and the fallback parser you've come up with looks much neater than mine :)
> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
> Key: TIKA-1445
> URL: https://issues.apache.org/jira/browse/TIKA-1445
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt,
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_20150106_tallison.patch,
> TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch,
> TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types,
> consider how to add back in the metadata extraction capabilities by the other
> Image parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)