Robert Scharpf created PDFBOX-3814:
--------------------------------------
Summary: PDFTextStripper extracts garbadge
Key: PDFBOX-3814
URL: https://issues.apache.org/jira/browse/PDFBOX-3814
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.6, 2.0.5
Environment: Windows 7 64-bit, Java
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Reporter: Robert Scharpf
Attachments: DataDirect Connect for ODBC User's Guide and Reference.pdf
Adobe Reader shows no problems with the attached PDF "DataDirect Connect for
ODBC User's Guide and Reference.pdf".
First 256 characters of extracted text (char + hex code) from PDFTextStripper:
000d
000d
000d
000d
000d
000d
000d
000d
000d 0001 B 0042 O 004f E 0045 0001 4 0034 F 0046 R 0052 V 0056 F 0046 -
002d J 004a O 004f L 004c 0001 B 0042 S 0053 F 0046 0001 S 0053 F 0046 H
0048 J 004a T 0054 U 0055 F 0046
I have a few more PDFs with the same symptom.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]