[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Tim Allison (JIRA) Mon, 27 Oct 2014 11:36:14 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185574#comment-14185574
 ]


Tim Allison commented on TIKA-1445:
-----------------------------------

I played with this a bit with a png test file.

The problem there is that besides the TesseractOCRParser, the GDALParser and 
the ImageParser both process png files.  So, there's no way to guarantee that 
the "other" parser actually parses Metadata.

One hack would be to hardcode checking the ImageParser or the JpegParser only 
to see if there is a match.

A better option would be something along the lines of what we do with the 
service loading pattern with AutoDetectReader.

The user could specify ImageMetadataParsers in a service listing, and we would 
try each one in turn to see if there is a match on type.


> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to