[
https://issues.apache.org/jira/browse/PDFBOX-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625906#comment-16625906
]
Tim Allison commented on PDFBOX-4322:
-------------------------------------
I haven't had a chance to try this with pure PDFBox yet, but I can confirm that
we're not getting the info in Tika 1.19: [^pdf__1.pdf.xml] We do try to
process the AcroForms and XFA (this doc doesn't appear to have XFA)...perhaps
we're not doing it right?
> Extract Text feature is not working for some part of PDF
> --------------------------------------------------------
>
> Key: PDFBOX-4322
> URL: https://issues.apache.org/jira/browse/PDFBOX-4322
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.2
> Reporter: Amit Maheshwari
> Priority: Major
> Attachments: pdf__1.pdf, pdf__1.pdf.xml
>
>
> Text Extraction feature cannot extract text from attached pdf properly.
>
> Text inside of rectangle box (e.g value of Lending Specialist and others) is
> not getting extracted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]