Robert Scharpf created PDFBOX-3814:
--------------------------------------

             Summary: PDFTextStripper extracts garbadge
                 Key: PDFBOX-3814
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3814
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.6, 2.0.5
         Environment: Windows 7 64-bit, Java 
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
            Reporter: Robert Scharpf
         Attachments: DataDirect Connect for ODBC User's Guide and Reference.pdf

Adobe Reader shows no problems with the attached PDF "DataDirect Connect for 
ODBC User's Guide and Reference.pdf". 
First 256 characters of extracted text (char + hex code) from PDFTextStripper:
 000d 
 000d 
 000d 
 000d 
 000d 
 000d 
 000d 
 000d 
 000d  0001 B 0042 O 004f E 0045  0001 4 0034 F 0046 R 0052 V 0056 F 0046 - 
002d J 004a O 004f L 004c  0001 B 0042 S 0053 F 0046  0001 S 0053 F 0046 H 
0048 J 004a T 0054 U 0055 F 0046 
I have a few more PDFs with the same symptom.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to