[
https://issues.apache.org/jira/browse/TIKA-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523880#comment-15523880
]
Tim Allison commented on TIKA-1903:
-----------------------------------
This is move away from my early proposal... This builds on one of Ray's
earlier points...on further thought...
Would there be objections to treating embedded metadata objects both:
1) as we're doing now -- extracting the pieces that we think most people want
into the Metadata object
AND
2) as embedded objects. We could parse the embedded XMP/XFA as an embedded
document via the existing embedded document parsing mechanism. The default
would be the DcXmlParser, and we'd add a new
{{TikaCoreProperties.EmbeddedResourceType}}, namely {{METADATA}}. If users
don't want this behavior, they can add their own parsers for these mime types
or they can turn off parsing for these mime types.
This is still heavily Tika-based, but users could {{/unpack}} files or {{-z}}
files, and they'd get the XMP/XFA files.
> Allow for more flexibility in handling embedded metadata objects (e.g. XMP)
> ---------------------------------------------------------------------------
>
> Key: TIKA-1903
> URL: https://issues.apache.org/jira/browse/TIKA-1903
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
>
> On TIKA-1607, we veered a bit from allowing flexible metadata structures to
> how to handle embedded metadata documents, such as XMP. Let's use this issue
> to discuss and design.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)