Wrong charecters
----------------

                 Key: PDFBOX-964
                 URL: https://issues.apache.org/jira/browse/PDFBOX-964
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 1.4.0, 1.1.0, 1.5.0
         Environment: Linux
            Reporter: Bogdan Artyushenko
         Attachments: nsw-solar-feed-in-tariff-report-to-ministers.pdf

I have a PDF document (format 1.5 PDF) and when I try to deal with it, PDFBox 
shows some junk characters. 

For example 
            PDDocumentInformation info = doc.getDocumentInformation();
            System.out.println("Title=" + info.getTitle());
            System.out.println("Author=" + info.getAuthor());
            System.out.println("Subject=" + info.getSubject());
            System.out.println("Keywords=" + info.getKeywords());
            System.out.println("Creator=" + info.getCreator());
            System.out.println("Producer=" + info.getProducer());
            System.out.println("Creation Date=" + info.getCreationDate())

Returns
Title=o,¢‘b‰zbÜcqhg­6cZêeGŸ9øÀÈÕß¶¹àéXð‡A<\Ðh„žÔ„Ñ®1
Author=o,¢‘v‰`
Subject=null
Keywords=null
Creator=Q“÷P…b
b…h6tzeyúLc^àb        ®4íÓ˜…¸ì
Producer=Q“÷P…b
b…h6tze<Z¸R"


The same goes on when I try to parse the file (I need to find all links in it).
For this I use:
            for (final Iterator jt = annotations.iterator(); jt.hasNext();) {
                final PDAnnotation annot = (PDAnnotation) jt.next();
                if (!annot.isInvisible()) {
                    if (annot instanceof PDAnnotationLink) {
                        final PDAnnotationLink link = (PDAnnotationLink) annot;
                        final PDAction action = link.getAction();
                        if (action instanceof PDActionURI) {
                            final PDActionURI uri = (PDActionURI) action;
And I got links of type "N<»¬_f`ȇœø²\½8Ø,ÑBä<ʓÇ{".

But if I open it with Document Viewer, Adobe Reader or midnigt commander I 
don't see any problems there.

I have tested it in 1.5, 1.4, 1.1 versions of PDFBox.



-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to