[
https://issues.apache.org/jira/browse/PDFBOX-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630826#comment-16630826
]
ASF subversion and git services commented on PDFBOX-4322:
---------------------------------------------------------
Commit 1842132 from [email protected] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1842132 ]
PDFBOX-4322: treat identity ToUnicode streams that are empty as identity
> Extract Text feature is not working for some part of PDF
> --------------------------------------------------------
>
> Key: PDFBOX-4322
> URL: https://issues.apache.org/jira/browse/PDFBOX-4322
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.2, 2.0.11
> Reporter: Amit Maheshwari
> Priority: Major
> Fix For: 2.0.13, 3.0.0 PDFBox
>
> Attachments: PDFBOX-4322-Empty-ToUnicode-reduced.pdf, pdf__1.pdf,
> pdf__1.pdf.xml
>
>
> Text Extraction feature cannot extract text from attached pdf properly.
>
> Text inside of rectangle box (e.g value of Lending Specialist and others) is
> not getting extracted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]