Tim Allison created TIKA-1765:
---------------------------------

             Summary: Some doc and docx store multiple authors as semi-colon 
delimited list
                 Key: TIKA-1765
                 URL: https://issues.apache.org/jira/browse/TIKA-1765
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison
            Priority: Trivial


It looks like doc and docx are storing multiple authors in a single author 
field delimited by semi-colons.  We should parse this value and add multiple 
authors where appropriate.

Notes: when I tried to add an author with a semicolon in the name, the result 
was two authors...doesn't look like there is any escaping going on.

We should check to see what's going on in the other MS formats and with other 
metadata items that are allowed to be multivalued in Dublin Core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to