Jaume Xaus created PDFBOX-4782:
----------------------------------
Summary: PDFLayoutTextStripper(), error on getText() method
Key: PDFBOX-4782
URL: https://issues.apache.org/jira/browse/PDFBOX-4782
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.18
Environment: Windows 10
Java Version 1.8.0_241
Reporter: Jaume Xaus
Attachments: AN20-0149-0602201842.pdf
I have this code:
PDDocument pdDoc = PDDocument.load(file);
PDFTextStripper stripper = new PDFLayoutTextStripper();
string = stripper.getText(pdDoc);
In some PDF documents when execute de getText() method I get this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at
com.sagedillepasa.gestion.TextLine.isSpaceCharacterAtIndex(PDFLayoutTextStripper.java:269)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]