[ 
https://issues.apache.org/jira/browse/TIKA-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523880#comment-15523880
 ] 

Tim Allison edited comment on TIKA-1903 at 9/26/16 6:54 PM:
------------------------------------------------------------

On further thought... This is a move away from my early proposal, and it builds 
on one of Ray's earlier points.

Would there be objections to treating embedded metadata objects both:

1) as we're doing now -- extracting the pieces that we think most people want 
into the Metadata object

AND

2) as embedded objects.  We could parse the embedded XMP/XFA as an embedded 
document via the existing embedded document parsing mechanism.  The default 
would be the DcXmlParser, and we'd add a new 
{{TikaCoreProperties.EmbeddedResourceType}}, namely {{METADATA}}.  If users 
don't want this behavior, they can add their own parsers for these mime types 
or they can turn off parsing for these mime types. 

This is still heavily Tika-based, but users could {{/unpack}} files or {{-z}} 
files, and they'd get the XMP/XFA files.




was (Author: [email protected]):
This is move away from my early proposal...  This builds on one of Ray's 
earlier points...on further thought...

Would there be objections to treating embedded metadata objects both:

1) as we're doing now -- extracting the pieces that we think most people want 
into the Metadata object

AND

2) as embedded objects.  We could parse the embedded XMP/XFA as an embedded 
document via the existing embedded document parsing mechanism.  The default 
would be the DcXmlParser, and we'd add a new 
{{TikaCoreProperties.EmbeddedResourceType}}, namely {{METADATA}}.  If users 
don't want this behavior, they can add their own parsers for these mime types 
or they can turn off parsing for these mime types. 

This is still heavily Tika-based, but users could {{/unpack}} files or {{-z}} 
files, and they'd get the XMP/XFA files.



> Allow for more flexibility in handling embedded metadata objects (e.g. XMP)
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-1903
>                 URL: https://issues.apache.org/jira/browse/TIKA-1903
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> On TIKA-1607, we veered a bit from allowing flexible metadata structures to 
> how to handle embedded metadata documents, such as XMP.  Let's use this issue 
> to discuss and design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to