[ 
https://issues.apache.org/jira/browse/TIKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jörg Ehrlich updated TIKA-929:
------------------------------

    Attachment: tika_OOXMLOffice_namespaces.patch

This patch should help to resolve this issue.

The patch contains the following:
* Definition of the OOXML namespace properties in Tika-core, except those 
properties which have equivalent definitions already in the Office Namespace 
interface.
* Declared the old properties in the MSOffice interface deprecated
* Adjustment of the related parsers to additionally map to the new OOXML 
properties
* Adjustment of related tests.
                
> Consistent, namespaced definitions for office file related metadata
> -------------------------------------------------------------------
>
>                 Key: TIKA-929
>                 URL: https://issues.apache.org/jira/browse/TIKA-929
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Nick Burch
>         Attachments: tika_OOXMLOffice_namespaces.patch
>
>
> Currently, we have the MSOffice metadata definitions, which is a mixture of 
> Properties and Strings, none of them namespaced. Despite the name, the keys 
> apply to a wide range of Office Documents (not just MS ones), and the keys 
> are taken from a mixture of sources.
> Similar to TIKA-925 / TIKA-928, we should replace these with prefixed 
> versions drawn from a few well known externally defined namespaces, then 
> deprecate the old ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to