[ https://issues.apache.org/jira/browse/PDFBOX-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625906#comment-16625906 ]
Tim Allison commented on PDFBOX-4322: ------------------------------------- I haven't had a chance to try this with pure PDFBox yet, but I can confirm that we're not getting the info in Tika 1.19: [^pdf__1.pdf.xml] We do try to process the AcroForms and XFA (this doc doesn't appear to have XFA)...perhaps we're not doing it right? > Extract Text feature is not working for some part of PDF > -------------------------------------------------------- > > Key: PDFBOX-4322 > URL: https://issues.apache.org/jira/browse/PDFBOX-4322 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.2 > Reporter: Amit Maheshwari > Priority: Major > Attachments: pdf__1.pdf, pdf__1.pdf.xml > > > Text Extraction feature cannot extract text from attached pdf properly. > > Text inside of rectangle box (e.g value of Lending Specialist and others) is > not getting extracted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org