Tim Allison created TIKA-1765:
---------------------------------
Summary: Some doc and docx store multiple authors as semi-colon
delimited list
Key: TIKA-1765
URL: https://issues.apache.org/jira/browse/TIKA-1765
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
Priority: Trivial
It looks like doc and docx are storing multiple authors in a single author
field delimited by semi-colons. We should parse this value and add multiple
authors where appropriate.
Notes: when I tried to add an author with a semicolon in the name, the result
was two authors...doesn't look like there is any escaping going on.
We should check to see what's going on in the other MS formats and with other
metadata items that are allowed to be multivalued in Dublin Core.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)