[
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215170#comment-14215170
]
Luis Filipe Nassif commented on TIKA-1445:
------------------------------------------
+1 to respect the order of parsers in the service file, instead of sorting the
full class names.
1) Creating a service loading of ImageMetadataParsers, afaik, can have the same
problem of different parsers trying to set the same metadata values. Metadata
values are multivalued, so can we simply add the values produced by different
parsers?
2) Yes, I think CompositeParser should append the content produced by different
supported parsers. If the user do not want all the parsers, he should customize
the parser service loading file.
3) It is a good idea to identify which parser produced each content with a
<div> tag.
> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
> Key: TIKA-1445
> URL: https://issues.apache.org/jira/browse/TIKA-1445
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: TIKA-1445.Mattmann.101214.patch.txt,
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt,
> TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types,
> consider how to add back in the metadata extraction capabilities by the other
> Image parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)