[
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Palsulich updated TIKA-1445:
----------------------------------
Attachment: TIKA-1445.Palsulich.102614.patch
Here is an updated patch with the above idea. I created a new public method in
CompositeParser and DefaultParser -- {{getAllParsersFor(ParseContext,
MediaType}} -- which returns a list of all Parsers that support the given type.
This list is then searched from TesseractOCRParser for a second Parser for the
image being parsed.
I created a dummy BodyContentHandler to drop all content from the second Parser.
Thoughts?
> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
> Key: TIKA-1445
> URL: https://issues.apache.org/jira/browse/TIKA-1445
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: TIKA-1445.Mattmann.101214.patch.txt,
> TIKA-1445.Palsulich.102614.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types,
> consider how to add back in the metadata extraction capabilities by the other
> Image parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)