[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Chris A. Mattmann (JIRA) Tue, 18 Nov 2014 14:45:19 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216960#comment-14216960
 ]


Chris A. Mattmann commented on TIKA-1445:
-----------------------------------------

Hi Nick:

I think we need to be careful to define "users". In my case, "users" aren't 
developers (who I think you are talking about when discussing adding new 
parsers above). My users simply want metadata and parsing that currently are 
partitioned amongst multiple Parsers in Tika, for the same MIME/MediaType. I 
could make one "super" Parser that combines them together; use the services 
trick per class to declare priority parsers, or delegates, or whatever. I think 
a much more modular and thus more easily maintainable way would be to provide a 
mechanism in which we allow multiple Parsers to be called for the same 
MediaType and to fill the Metadata object and Content stream.

That said, I don't have a solution yet, but I am trying to think of one. Glad 
to have the conversation with you guys here. It's a tough problem.


> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, 
> TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to