[
https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983813#comment-13983813
]
Tim Allison edited comment on TIKA-1283 at 4/29/14 12:47 AM:
-------------------------------------------------------------
[~thaichat04], thank you, as always. By "thumbnail," I'd also want to include
images/icons of documents that are included only for display purposes. For
example, the icon image (image1.emf) in test-documents/EmbeddedPDF.docx doesn't
have a "relationship"=thumbnail, but I'd want to include that as a thumbnail
because it appears as an <v:shape> within a <w:object>.
The point you make about the differences in handling of these by application is
right on. Each application links thumbnail images to the underlying data in
different ways, and we'll have to go application by application to do this
correctly (whether we go with this or TIKA-90)
I'm not held to the original proposal in this issue, and I like the clarity of
TIKA-90 quite a bit. Some other thoughts...the signature I proposed above
won't work because a given embedded resource can have more than one thumbnail
(at least for RTFs) and it misses metadata around the thumbnail image (such as
mediaType of the thumbnail).
was (Author: [email protected]):
[~thaichat04], thank you, as always. By "thumbnail," I'd also want to include
images/icons of documents that are included only for display purposes. For
example, the icon image (image1.emf) in test-documents/EmbeddedPDF.docx doesn't
have a "relationship"=thumbnail, but I'd want to include that as a thumbnail
because it appears as an <v:shape> within a <w:object>.
The point you make about the differences in handling of these by application is
right on. Each application links thumbnail images to the underlying data in
different ways, and we'll have to go application by application to do this
correctly (whether we go with this or TIKA-90)
I'm not held to the original proposal in this issue, and I like the clarity of
TIKA-90 quite a bit. Some other thoughts...the signature I proposed above
won't work because a given image can have more than one thumbnail (at least for
RTFs) and it misses metadata around the thumbnail image (such as mediaType of
the thumbnail).
> Add "thumbnail" as possible metadata item to TikaCoreProperties
> ---------------------------------------------------------------
>
> Key: TIKA-1283
> URL: https://issues.apache.org/jira/browse/TIKA-1283
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Tim Allison
> Priority: Minor
>
> TIKA-90 originally requested to add thumbnails to a document's metadata.
> I'd like to have a unified way of determining whether an embedded
> document/resource is a thumbnail or a regular attachment.
> With the changes in TIKA-1223 (ooxml) and TIKA-1010 (rtf), we are now pulling
> out more thumbnails than before.
> I propose adding "tika:thumbnail" to the metadata of each thumbnail image.
> The consumer can then determine what to do with the embedded resource based
> on the metadata.
--
This message was sent by Atlassian JIRA
(v6.2#6252)