[
https://issues.apache.org/jira/browse/TIKA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandre Madurell updated TIKA-1252:
-------------------------------------
Attachment: Sample.xmp
Sample.pdf
Thanks so much!
Follows a blank sample PDF with the XMP metadata imported into it (just like we
do with the full documents).
In the meantime, I'll try modifying the schema and XMP data so we use a custom
field for the document authors (those who wrote the article, book review,
letter to editor, etc) and leave Acrobat's creator field for the publisher
(single entry). If that works, we can check if there's any difference in the
parser's code for custom and non-custom fields.
Thanks again! I'll get back with the results of the test ASAP.
> Tika is not indexing all authors of a PDF
> -----------------------------------------
>
> Key: TIKA-1252
> URL: https://issues.apache.org/jira/browse/TIKA-1252
> Project: Tika
> Issue Type: Bug
> Components: metadata, parser
> Affects Versions: 1.4
> Environment: Ubuntu 12.04 (x64) Solr 4.6.0 (Amazon Web Services,
> Bitnami Stack)
> Reporter: Alexandre Madurell
> Attachments: Sample.pdf, Sample.xmp
>
>
> When submitting a PDF with this information in its XMP metadata:
> ...
> <dc:creator>
> <rdf:Bag>
> <rdf:li>Author 1</rdf:li>
> <rdf:li>Author 2</rdf:li>
> </rdf:Bag>
> </dc:creator>
> ...
> Only the first one appears in the collection:
> ...
> "author":["Author 1"],
> "author_s":"Author 1",
> ...
> In spite of having set the field to multiValued in the Solr schema:
> <field name="author" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
> Let me know if there's any further specific information I could provide.
> Thanks in advance!
--
This message was sent by Atlassian JIRA
(v6.2#6252)