[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216960#comment-14216960 ]
Chris A. Mattmann commented on TIKA-1445: ----------------------------------------- Hi Nick: I think we need to be careful to define "users". In my case, "users" aren't developers (who I think you are talking about when discussing adding new parsers above). My users simply want metadata and parsing that currently are partitioned amongst multiple Parsers in Tika, for the same MIME/MediaType. I could make one "super" Parser that combines them together; use the services trick per class to declare priority parsers, or delegates, or whatever. I think a much more modular and thus more easily maintainable way would be to provide a mechanism in which we allow multiple Parsers to be called for the same MediaType and to fill the Metadata object and Content stream. That said, I don't have a solution yet, but I am trying to think of one. Glad to have the conversation with you guys here. It's a tough problem. > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)