[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Luis Filipe Nassif (JIRA) Mon, 17 Nov 2014 13:05:42 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215170#comment-14215170
 ]


Luis Filipe Nassif commented on TIKA-1445:
------------------------------------------

+1 to respect the order of parsers in the service file, instead of sorting the 
full class names.

1) Creating a service loading of ImageMetadataParsers, afaik, can have the same 
problem of different parsers trying to set the same metadata values. Metadata 
values are multivalued, so can we simply add the values produced by different 
parsers?

2) Yes, I think CompositeParser should append the content produced by different 
supported parsers. If the user do not want all the parsers, he should customize 
the parser service loading file.  

3) It is a good idea to identify which parser produced each content with a 
<div> tag.

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, 
> TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to