[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Nick Burch (JIRA) Wed, 07 Jan 2015 03:51:56 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267553#comment-14267553
 ]


Nick Burch commented on TIKA-1445:
----------------------------------

I wonder if it wouldn't be better to do the "is tessaract there" check in the 
`getSupportedTypes` method? That way, if tessaract can't be found, then the 
main composite parser (eg AutoDetectParser, if being used) would just skip over 
the Tessarct one, and fall back to the Jpeg or Image one as appropriate

We could then do an additional check at parse time, in case of a direct call to 
the parser.

I'll have a go at working that up shortly

Oh, and the fallback parser you've come up with looks much neater than mine :)

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_20150106_tallison.patch, 
> TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, 
> TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to