[
https://issues.apache.org/jira/browse/TIKA-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mads Hansen updated TIKA-438:
-----------------------------
Priority: Minor (was: Major)
> Parse and return the complete set of custom document properties from MS
> Office documents
> ----------------------------------------------------------------------------------------
>
> Key: TIKA-438
> URL: https://issues.apache.org/jira/browse/TIKA-438
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 0.7
> Reporter: Mads Hansen
> Priority: Minor
> Attachments: SummaryExtractor.java
>
>
> All MS Office document custom properties should be parsed and returned in the
> Metadata set. This would be consistent with how all HTML meta tags are
> parsed and returned.
> CustomProperties are already being parsed to produce the Metadata.LANGUAGE
> property when normalizing document properties into the Dublin Core metadata
> set. With minor modifications to the
> org.apache.tika.parser.microsoft.SummaryExtractor class the entire set of
> Custom Properties could be obtained and set for the document metadata.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.