[
https://issues.apache.org/jira/browse/TIKA-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523880#comment-15523880
]
Tim Allison edited comment on TIKA-1903 at 9/26/16 6:54 PM:
------------------------------------------------------------
On further thought... This is a move away from my early proposal, and it builds
on one of Ray's earlier points.
Would there be objections to treating embedded metadata objects both:
1) as we're doing now -- extracting the pieces that we think most people want
into the Metadata object
AND
2) as embedded objects. We could parse the embedded XMP/XFA as an embedded
document via the existing embedded document parsing mechanism. The default
would be the DcXmlParser, and we'd add a new
{{TikaCoreProperties.EmbeddedResourceType}}, namely {{METADATA}}. If users
don't want this behavior, they can add their own parsers for these mime types
or they can turn off parsing for these mime types.
This is still heavily Tika-based, but users could {{/unpack}} files or {{-z}}
files, and they'd get the XMP/XFA files.
was (Author: [email protected]):
This is move away from my early proposal... This builds on one of Ray's
earlier points...on further thought...
Would there be objections to treating embedded metadata objects both:
1) as we're doing now -- extracting the pieces that we think most people want
into the Metadata object
AND
2) as embedded objects. We could parse the embedded XMP/XFA as an embedded
document via the existing embedded document parsing mechanism. The default
would be the DcXmlParser, and we'd add a new
{{TikaCoreProperties.EmbeddedResourceType}}, namely {{METADATA}}. If users
don't want this behavior, they can add their own parsers for these mime types
or they can turn off parsing for these mime types.
This is still heavily Tika-based, but users could {{/unpack}} files or {{-z}}
files, and they'd get the XMP/XFA files.
> Allow for more flexibility in handling embedded metadata objects (e.g. XMP)
> ---------------------------------------------------------------------------
>
> Key: TIKA-1903
> URL: https://issues.apache.org/jira/browse/TIKA-1903
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
>
> On TIKA-1607, we veered a bit from allowing flexible metadata structures to
> how to handle embedded metadata documents, such as XMP. Let's use this issue
> to discuss and design.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)