Tim Allison created TIKA-1765: --------------------------------- Summary: Some doc and docx store multiple authors as semi-colon delimited list Key: TIKA-1765 URL: https://issues.apache.org/jira/browse/TIKA-1765 Project: Tika Issue Type: Improvement Reporter: Tim Allison Priority: Trivial
It looks like doc and docx are storing multiple authors in a single author field delimited by semi-colons. We should parse this value and add multiple authors where appropriate. Notes: when I tried to add an author with a semicolon in the name, the result was two authors...doesn't look like there is any escaping going on. We should check to see what's going on in the other MS formats and with other metadata items that are allowed to be multivalued in Dublin Core. -- This message was sent by Atlassian JIRA (v6.3.4#6332)