Wrong charecters
----------------
Key: PDFBOX-964
URL: https://issues.apache.org/jira/browse/PDFBOX-964
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 1.4.0, 1.1.0, 1.5.0
Environment: Linux
Reporter: Bogdan Artyushenko
Attachments: nsw-solar-feed-in-tariff-report-to-ministers.pdf
I have a PDF document (format 1.5 PDF) and when I try to deal with it, PDFBox
shows some junk characters.
For example
PDDocumentInformation info = doc.getDocumentInformation();
System.out.println("Title=" + info.getTitle());
System.out.println("Author=" + info.getAuthor());
System.out.println("Subject=" + info.getSubject());
System.out.println("Keywords=" + info.getKeywords());
System.out.println("Creator=" + info.getCreator());
System.out.println("Producer=" + info.getProducer());
System.out.println("Creation Date=" + info.getCreationDate())
Returns
Title=o,¢bzbÜcqhg6cZêeG9øÀÈÕß¶¹àéXðA<\ÐhÔÑ®1
Author=o,¢v`
Subject=null
Keywords=null
Creator=Q÷P
b
b
h6tzeyúLc^àb ®4íÓ
¸ì
Producer=Q÷P
b
b
h6tze<Z¸R"
The same goes on when I try to parse the file (I need to find all links in it).
For this I use:
for (final Iterator jt = annotations.iterator(); jt.hasNext();) {
final PDAnnotation annot = (PDAnnotation) jt.next();
if (!annot.isInvisible()) {
if (annot instanceof PDAnnotationLink) {
final PDAnnotationLink link = (PDAnnotationLink) annot;
final PDAction action = link.getAction();
if (action instanceof PDActionURI) {
final PDActionURI uri = (PDActionURI) action;
And I got links of type "N<»¬_f`Èø²\½8Ø,ÑBä<ÊÇ{".
But if I open it with Document Viewer, Adobe Reader or midnigt commander I
don't see any problems there.
I have tested it in 1.5, 1.4, 1.1 versions of PDFBox.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira