PDFBOX 1.2.1 Text parsing issue
-------------------------------
Key: PDFBOX-781
URL: https://issues.apache.org/jira/browse/PDFBOX-781
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.2.1
Environment: Windows XP, Java
Reporter: Mahesh
Priority: Trivial
Hi,
Thanks a lot for PDFBOX.
I have been using pdfbox 1.2.1 for text parsing.I have customized my Text
parsing class by extending PDFTextStripper class.
The issue is : Though i am able to get all required string data (such as x/y
position,width ,height,font name,font size) , the text that is extracted using
Textposition object's getCharacter() returns the full text line except for the
last charater.This last character appears as next line text.
Ex: (Line in PDF ): "My name is Mahesh"
(Parsed data): "My name is Mahes"
"h"
Please help me in this regard.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.