[ 
https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983163#comment-13983163
 ] 

Tim Allison commented on TIKA-1283:
-----------------------------------

I look forward to feedback on this issue.  I think there is a fairly clear 
distinction between thumbnail and attached image, but this might get murky.

On specific document types, there are some issues:
* RTF is easy
* ooxml now has a literal "thumbnail", but there are also the emf and wmf files 
that do not have a literal thumbnail "relationship"...how do we handle these?
* pre-ooxml office...haven't dug deeply yet, but thumbnails there are emf and 
wmf...no?
* PDF...I'd also like to be able to distinguish between attached image files 
and embedded image files (TIKA-1268), but this is better handled as a separate 
issue?

*other formats??

> Add "thumbnail" as possible metadata item to TikaCoreProperties
> ---------------------------------------------------------------
>
>                 Key: TIKA-1283
>                 URL: https://issues.apache.org/jira/browse/TIKA-1283
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: Tim Allison
>            Priority: Minor
>
> TIKA-90 originally requested to add thumbnails to a document's metadata.
> I'd like to have a unified way of determining whether an embedded 
> document/resource is a thumbnail or a regular attachment.
> With the changes in TIKA-1223 (ooxml) and TIKA-1010 (rtf), we are now pulling 
> out more thumbnails than before.
> I propose adding "tika:thumbnail" to the metadata of each embedded document.  
> The consumer can then determine what to do with the embedded resource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to