[
https://issues.apache.org/jira/browse/PDFBOX-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966214#action_12966214
]
Andreas Lehmkühler commented on PDFBOX-858:
-------------------------------------------
Martijn is correct, both pdfs are encrypted and the metadata will be available
after decrypting using the current trunk.
The ExtractMetadata [1] example was improved in revision 1041509. It'll try to
use the document information if there isn't any metadata within the catalogue.
> Metadata extraction broken on some PDF files
> --------------------------------------------
>
> Key: PDFBOX-858
> URL: https://issues.apache.org/jira/browse/PDFBOX-858
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.2.1, 1.3.1
> Reporter: Patrik Stenmark
> Attachments: 2001Derivatives and Public Debt Mngt.pdf,
> RethinkingTheFinancialNetwork.pdf
>
>
> On certain PDF files (examples attached), the metadata extraction seems to be
> broken. Preview (on Mac OS X) and Acrobat Reader is able to read the
> metadata, but PDFbox gives complete jibberish:
> Author=è'ÿÆ??kÔ7??ÕªG?
> I've tried both the version included in Tika 0.7 (1.0.0 I believe) and
> r1021264 from SVN.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.