[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215170#comment-14215170 ]
Luis Filipe Nassif commented on TIKA-1445: ------------------------------------------ +1 to respect the order of parsers in the service file, instead of sorting the full class names. 1) Creating a service loading of ImageMetadataParsers, afaik, can have the same problem of different parsers trying to set the same metadata values. Metadata values are multivalued, so can we simply add the values produced by different parsers? 2) Yes, I think CompositeParser should append the content produced by different supported parsers. If the user do not want all the parsers, he should customize the parser service loading file. 3) It is a good idea to identify which parser produced each content with a <div> tag. > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)