Metadata extraction broken on some PDF files
--------------------------------------------
Key: PDFBOX-858
URL: https://issues.apache.org/jira/browse/PDFBOX-858
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 1.2.1, 1.3.0
Reporter: Patrik Stenmark
Attachments: 2001Derivatives and Public Debt Mngt.pdf,
RethinkingTheFinancialNetwork.pdf
On certain PDF files (examples attached), the metadata extraction seems to be
broken. Preview (on Mac OS X) and Acrobat Reader is able to read the metadata,
but PDFbox gives complete jibberish:
Author=è'ÿÆ??kÔ7??ÕªG?
I've tried both the version included in Tika 0.7 (1.0.0 I believe) and r1021264
from SVN.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.