NathanJ created PDFBOX-4769:
-------------------------------

             Summary: Problem pdf version 1.4
                 Key: PDFBOX-4769
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4769
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.17
         Environment: java, maven, 
            Reporter: NathanJ


Here is my problem. I have to read pdf files and i decided to use pdfbox. I'm 
using the following code to read my file line by line to execute some actions 
on each ones :

File tempFile = "_myPdfFile"_

{color:#cc7832}try {color}(PDDocument document = PDDocument.load(tempFile)) 
{{color:#cc7832}
{color}{color:#cc7832}
{color}{color:#cc7832} if {color}(!document.isEncrypted())
 {
 PDFTextStripperByArea stripper = {color:#cc7832}new 
{color}PDFTextStripperByArea(){color:#cc7832};
{color} stripper.setSortByPosition({color:#cc7832}true{color}){color:#cc7832};
{color} PDFTextStripper tStripper = {color:#cc7832}new 
{color}PDFTextStripper(){color:#cc7832};
{color} String pdfFileInText = tStripper.getText(document){color:#cc7832};
{color} String lines[] = 
pdfFileInText.split({color:#6a8759}"{color}{color:#cc7832}\\{color}{color:#6a8759}r?{color}{color:#cc7832}\\{color}{color:#6a8759}n"{color}){color:#cc7832};{color}

For a pdf in format version 1.7, all is working well. But sometimes, i have to 
work with pdf version 1.4 and at this moment there is a problem : the 
PDFTextStripper is unable to read the pdf and my "pdfFileInText" get this value 
: "\r\n\r\n" and that's all. 

 

I didn't find any solutions on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to