[
https://issues.apache.org/jira/browse/TIKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörg Ehrlich updated TIKA-929:
------------------------------
Attachment: tika_OOXMLOffice_namespaces.patch
This patch should help to resolve this issue.
The patch contains the following:
* Definition of the OOXML namespace properties in Tika-core, except those
properties which have equivalent definitions already in the Office Namespace
interface.
* Declared the old properties in the MSOffice interface deprecated
* Adjustment of the related parsers to additionally map to the new OOXML
properties
* Adjustment of related tests.
> Consistent, namespaced definitions for office file related metadata
> -------------------------------------------------------------------
>
> Key: TIKA-929
> URL: https://issues.apache.org/jira/browse/TIKA-929
> Project: Tika
> Issue Type: Improvement
> Reporter: Nick Burch
> Attachments: tika_OOXMLOffice_namespaces.patch
>
>
> Currently, we have the MSOffice metadata definitions, which is a mixture of
> Properties and Strings, none of them namespaced. Despite the name, the keys
> apply to a wide range of Office Documents (not just MS ones), and the keys
> are taken from a mixture of sources.
> Similar to TIKA-925 / TIKA-928, we should replace these with prefixed
> versions drawn from a few well known externally defined namespaces, then
> deprecate the old ones.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira