[jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Tyler Palsulich (JIRA) Sun, 26 Oct 2014 14:29:31 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tyler Palsulich updated TIKA-1445:
----------------------------------
    Attachment: TIKA-1445.Palsulich.102614.patch

Here is an updated patch with the above idea. I created a new public method in 
CompositeParser and DefaultParser -- {{getAllParsersFor(ParseContext, 
MediaType}} -- which returns a list of all Parsers that support the given type. 
This list is then searched from TesseractOCRParser for a second Parser for the 
image being parsed.

I created a dummy BodyContentHandler to drop all content from the second Parser.

Thoughts?

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to