Parse and return the complete set of custom document properties from MS Office
documents
----------------------------------------------------------------------------------------
Key: TIKA-438
URL: https://issues.apache.org/jira/browse/TIKA-438
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 0.7
Reporter: Mads Hansen
All MS Office document custom properties should be parsed and returned in the
Metadata set. This would be consistent with how all HTML meta tags are parsed
and returned.
CustomProperties are already being parsed to produce the Metadata.LANGUAGE
property when normalizing document properties into the Dublin Core metadata
set. With minor modifications to the
org.apache.tika.parser.microsoft.SummaryExtractor class the entire set of
Custom Properties could be obtained and set for the document metadata.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.