[ https://issues.apache.org/jira/browse/PDFBOX-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778306#action_12778306 ]
Jignesh Sh commented on PDFBOX-547: ----------------------------------- This issue is closed after I use the following 2 latest PDF box jar files pdfbox-0.8.0-incubating.jar fontbox-0.8.0-incubating.jar Thanks, Jignesh > problem in extracting text using PDFBox > --------------------------------------- > > Key: PDFBOX-547 > URL: https://issues.apache.org/jira/browse/PDFBOX-547 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 0.7.0 > Reporter: Jignesh Sh > Original Estimate: 96h > Remaining Estimate: 96h > > Hi All, > I am facing problem in extracting text using PDFBox. > Program hang at the line pdfText = stripper.getText(pdDoc); and returns > nothing. > Actually I am using PDFBox version PDFBox-0.6.7a.jar > Here is my code > public String getPDFContent(ZipEntry pdfEntry) > { > boolean status = false; > String pdfText = null; > ZipIssueFactory issueFactory = null; > logger.debug("Processing : " + pdfEntry.getName()); > COSDocument cosDoc = null; > PDDocument pdDoc = null; > try > { > cosDoc = > parseDocument(zipFile.getInputStream(pdfEntry)); // Load InputStream > into memory > > // skipping the PDF document, if it is encrypted > if (cosDoc.isEncrypted()) { > logger.warn("Can not decrypt PDF document w/o > password, skipping:"+ pdfEntry.getName()); > return pdfText; > } > // extract PDF document's textual content > pdDoc = new PDDocument(cosDoc); > PDFTextStripper stripper = new PDFTextStripper(); > pdfText = stripper.getText(pdDoc); > } > catch (IOException e) { > pdfText = null; > logger.error("IOException in parsing PDF document: " + e); > } > finally{ > closeCOSDocument(cosDoc); > closePDDocument(pdDoc); > } > return pdfText; > } > private static COSDocument parseDocument(InputStream is) throws IOException { > PDFParser parser = new PDFParser(is); > parser.parse(); > return parser.getDocument(); > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.