Problem extracting text in newline characters
---------------------------------------------

                 Key: PDFBOX-588
                 URL: https://issues.apache.org/jira/browse/PDFBOX-588
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 0.8.0-incubator
         Environment: Win XP
            Reporter: Hesham


Hello ,
 
I have a PDF file with 1 page only, when I try to extract its text using :
String pageData = stripper.getText( pdfFile );

It ignores some Enter characters between lines, so the last word in the line 
and the first word in the next line appear as 1 word without spaces between 
them !!

While if I copy the PDF text manually from the PDF and paste it in a text 
editor, Enter characters appear after the same lines that caused the problem in 
PDFBox.
You can download the PDF file from here to try it :
http://www.4shared.com/file/185259485/5d937eb/Enters-sample.html
 
Is there a way to fix this ?
 
Best regards ,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to