NathanJ created PDFBOX-4769:
-------------------------------
Summary: Problem pdf version 1.4
Key: PDFBOX-4769
URL: https://issues.apache.org/jira/browse/PDFBOX-4769
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.17
Environment: java, maven,
Reporter: NathanJ
Here is my problem. I have to read pdf files and i decided to use pdfbox. I'm
using the following code to read my file line by line to execute some actions
on each ones :
File tempFile = "_myPdfFile"_
{color:#cc7832}try {color}(PDDocument document = PDDocument.load(tempFile))
{{color:#cc7832}
{color}{color:#cc7832}
{color}{color:#cc7832} if {color}(!document.isEncrypted())
{
PDFTextStripperByArea stripper = {color:#cc7832}new
{color}PDFTextStripperByArea(){color:#cc7832};
{color} stripper.setSortByPosition({color:#cc7832}true{color}){color:#cc7832};
{color} PDFTextStripper tStripper = {color:#cc7832}new
{color}PDFTextStripper(){color:#cc7832};
{color} String pdfFileInText = tStripper.getText(document){color:#cc7832};
{color} String lines[] =
pdfFileInText.split({color:#6a8759}"{color}{color:#cc7832}\\{color}{color:#6a8759}r?{color}{color:#cc7832}\\{color}{color:#6a8759}n"{color}){color:#cc7832};{color}
For a pdf in format version 1.7, all is working well. But sometimes, i have to
work with pdf version 1.4 and at this moment there is a problem : the
PDFTextStripper is unable to read the pdf and my "pdfFileInText" get this value
: "\r\n\r\n" and that's all.
I didn't find any solutions on the web.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]