David Pilato created TIKA-2227:
----------------------------------
Summary: Replacement of MSOffice#KEYWORDS for RTF and ODT docs
Key: TIKA-2227
URL: https://issues.apache.org/jira/browse/TIKA-2227
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14
Reporter: David Pilato
Priority: Minor
I'm trying to extract metadata from different type of documents.
I'm using for that {{metadata.get(MSOffice.KEYWORDS)}} but it's marked as
{{Deprecated}} by {{Office}} class.
So I changed my code to use now {{metadata.get(Office.KEYWORDS)}} instead.
It does not work for 2 types of docs:
* RTF:
https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.rtf
* ODT:
https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.odt
It seems that RTF and ODT keywords are extracted to a {{"Keyword"}} metadata
name although they should probably be generated to {{"meta:keyword"}}.
You can reuse if needed the documents I linked to here as test case if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)