[ https://issues.apache.org/jira/browse/TIKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jörg Ehrlich updated TIKA-929: ------------------------------ Attachment: tika_OOXMLOffice_namespaces.patch This patch should help to resolve this issue. The patch contains the following: * Definition of the OOXML namespace properties in Tika-core, except those properties which have equivalent definitions already in the Office Namespace interface. * Declared the old properties in the MSOffice interface deprecated * Adjustment of the related parsers to additionally map to the new OOXML properties * Adjustment of related tests. > Consistent, namespaced definitions for office file related metadata > ------------------------------------------------------------------- > > Key: TIKA-929 > URL: https://issues.apache.org/jira/browse/TIKA-929 > Project: Tika > Issue Type: Improvement > Reporter: Nick Burch > Attachments: tika_OOXMLOffice_namespaces.patch > > > Currently, we have the MSOffice metadata definitions, which is a mixture of > Properties and Strings, none of them namespaced. Despite the name, the keys > apply to a wide range of Office Documents (not just MS ones), and the keys > are taken from a mixture of sources. > Similar to TIKA-925 / TIKA-928, we should replace these with prefixed > versions drawn from a few well known externally defined namespaces, then > deprecate the old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira