[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216365#comment-14216365 ]
Tim Allison commented on TIKA-1445: ----------------------------------- Copied from dev discussion to record points on this issue. Will not duplicate in future. Sorry! On issue 1: The proposal is that we'd send in a fresh Metadata object to each parser and then combine that information into a new Metadata object either via add or set. If we go this route, we'll lose the restrictions that Properties may have originally held (e.g. one value as in TikaCoreProperties.TITLE). On Issue 2: I think we're talking about different things. Yes, we'll definitely need to reset or spool the stream depending on its length. My concern was more with the handlers. If the first parser calls endDocument() and we don't shield that, then if someone uses the BodyContentHandler, then they might not see contents from the second/third parser because the initial parser "ended" the document. I need to test this concern, but I think that this was the root of TIKA-1124. > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)