[ 
https://issues.apache.org/jira/browse/TIKA-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523586#comment-15523586
 ] 

Hudson commented on TIKA-2057:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1109 (See 
[https://builds.apache.org/job/Tika-trunk/1109/])
TIKA-2057 - maintain DocInfo metadata in PDFs (tallison: rev 
ce07d8a10499fae015f07ca4fd4daf3473ca5193)
* (edit) CHANGES.txt
* (add) tika-parsers/src/test/resources/test-documents/testPDF_diffTitles.pdf
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (add) tika-core/src/main/java/org/apache/tika/metadata/PDF.java


> Extract PDF DocInfo fields into separate metadata fields
> --------------------------------------------------------
>
>                 Key: TIKA-2057
>                 URL: https://issues.apache.org/jira/browse/TIKA-2057
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.13
>            Reporter: John Haynes
>            Assignee: Tim Allison
>            Priority: Minor
>         Attachments: int_Consumer_Conditions_of_use.pdf
>
>
> Hi,
> I have a PDF in which title has been set twice -- once as Dublin core 
> metadata: {code}<dc:title>
>   <rdf:Alt>
>     <rdf:li xml:lang="x-default">
>       Consumer credit cards - conditions of use
>     </rdf:li>
>   </rdf:Alt>
> </dc:title>{code}
> and again in the PDF DocInfo section: {code}
> /Title(Consumer Credit Card - Conditions of Use){code}
> When I use Tika to transform the PDF into HTML {code}java -jar 
> tika-app-1.13.jar int_Consumer_Conditions_of_use.pdf{code} it outputs this 
> metadata: {code}<meta name="dc:title" content="Consumer credit cards - 
> conditions of use"/>{code} and this <title> tag: {code}<title>Consumer credit 
> cards - conditions of use</title>{code} meaning we no longer have access to 
> the DocInfo title.
> Is there some way you could adapt Tika to copy this PDF DocInfo forward 
> during a conversion under a new type of metadata, e.g. {code}
> <meta name="docinfo:title" content="Consumer Credit Card - Conditions of 
> Use"/>{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to