[jira] [Commented] (PDFBOX-4322) Extract Text feature is not working for some part of PDF

Tim Allison (JIRA) Mon, 24 Sep 2018 07:39:10 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625906#comment-16625906
 ]


Tim Allison commented on PDFBOX-4322:
-------------------------------------

I haven't had a chance to try this with pure PDFBox yet, but I can confirm that 
we're not getting the info in Tika 1.19: [^pdf__1.pdf.xml]  We do try to 
process the AcroForms and XFA (this doc doesn't appear to have XFA)...perhaps 
we're not doing it right?

> Extract Text feature is not working for some part of PDF
> --------------------------------------------------------
>
>                 Key: PDFBOX-4322
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4322
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.2
>            Reporter: Amit Maheshwari
>            Priority: Major
>         Attachments: pdf__1.pdf, pdf__1.pdf.xml
>
>
> Text Extraction feature cannot extract text from attached pdf properly.
>  
> Text inside of rectangle box (e.g value of Lending Specialist and others) is 
> not getting extracted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4322) Extract Text feature is not working for some part of PDF

Reply via email to