[ https://issues.apache.org/jira/browse/PDFBOX-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034691#comment-17034691 ]
Tilman Hausherr commented on PDFBOX-4769: ----------------------------------------- Please attach the PDF. Also read this: https://pdfbox.apache.org/2.0/faq.html#text-extraction > Problem pdf version 1.4 > ----------------------- > > Key: PDFBOX-4769 > URL: https://issues.apache.org/jira/browse/PDFBOX-4769 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.17 > Environment: java, maven, > Reporter: NathanJ > Priority: Blocker > > Here is my problem. I have to read pdf files and i decided to use pdfbox. I'm > using the following code to read my file line by line to execute some actions > on each ones : > File tempFile = "_myPdfFile"_ > {color:#cc7832}try {color}(PDDocument document = PDDocument.load(tempFile)) > {{color:#cc7832} > {color}{color:#cc7832} > {color}{color:#cc7832} if {color}(!document.isEncrypted()) > { > PDFTextStripperByArea stripper = {color:#cc7832}new > {color}PDFTextStripperByArea(){color:#cc7832}; > {color} stripper.setSortByPosition({color:#cc7832}true{color}){color:#cc7832}; > {color} PDFTextStripper tStripper = {color:#cc7832}new > {color}PDFTextStripper(){color:#cc7832}; > {color} String pdfFileInText = tStripper.getText(document){color:#cc7832}; > {color} String lines[] = > pdfFileInText.split({color:#6a8759}"{color}{color:#cc7832}\\{color}{color:#6a8759}r?{color}{color:#cc7832}\\{color}{color:#6a8759}n"{color}){color:#cc7832};{color} > For a pdf in format version 1.7, all is working well. But sometimes, i have > to work with pdf version 1.4 and at this moment there is a problem : the > PDFTextStripper is unable to read the pdf and my "pdfFileInText" get this > value : "\r\n\r\n" and that's all. > > I didn't find any solutions on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org