[ 
https://issues.apache.org/jira/browse/TIKA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1252:
------------------------------

    Comment: was deleted

(was: [~alexandre.madur...@gmail.com], you might also consider opening an issue 
on PDFBox if it is valid XMP-ese to have a bag list for creators.  

It looks like XMPSchemaDublinCore's getCreators() is expecting a sequence list 
(not a bag list).  So, getCreators() in XMPSchemaDublinCore works correctly on 
this:
{noformat}
 <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/"; rdf:about="">
         <dc:format>application/pdf</dc:format>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>TEST CREATOR</rdf:li>
            </rdf:Seq>
         </dc:creator>
...
{noformat}
but it fails on your example.  Either way, we need to update Tika's extraction, 
but it would be great to fix this in PDFBox too if it is actually an issue.)

> Tika is not indexing all authors of a PDF
> -----------------------------------------
>
>                 Key: TIKA-1252
>                 URL: https://issues.apache.org/jira/browse/TIKA-1252
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.4
>         Environment: Ubuntu 12.04 (x64) Solr 4.6.0 (Amazon Web Services, 
> Bitnami Stack)
>            Reporter: Alexandre Madurell
>         Attachments: Sample.pdf, Sample.xmp
>
>
> When submitting a PDF with this information in its XMP metadata:
> ...
>       <dc:creator>
>         <rdf:Bag>
>           <rdf:li>Author 1</rdf:li>
>           <rdf:li>Author 2</rdf:li>
>         </rdf:Bag>
>       </dc:creator>
> ...
> Only the first one appears in the collection:
> ...
>         "author":["Author 1"],
>         "author_s":"Author 1",
> ...
> In spite of having set the field to multiValued in the Solr schema:
> <field name="author" type="text_general" indexed="true" stored="true" 
> multiValued="true"/>
> Let me know if there's any further specific information I could provide.
> Thanks in advance! 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to