Parse and return the complete set of custom document properties from MS Office 
documents
----------------------------------------------------------------------------------------

                 Key: TIKA-438
                 URL: https://issues.apache.org/jira/browse/TIKA-438
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.7
            Reporter: Mads Hansen


All MS Office document custom properties should be parsed and returned in the 
Metadata set.  This would be consistent with how all HTML meta tags are parsed 
and returned.

CustomProperties are already being parsed to produce the Metadata.LANGUAGE 
property when normalizing document properties into the Dublin Core metadata 
set.  With minor modifications to the 
org.apache.tika.parser.microsoft.SummaryExtractor class the entire set of 
Custom Properties could be obtained and set for the document metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to