[
https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844111#comment-16844111
]
Tilman Hausherr commented on PDFBOX-4550:
-----------------------------------------
[^PDFBOX-3442-DirectResources_unc.pdf] I have that file in the regression
tests of PDFTextStripper (in {{pdfbox/src/test/resources/input}}), so the
stripper is called directly without the check whether extraction is allowed. In
the past it produced a text, and after the change it no longer does. The reason
is that the interval is larger than 255 values. Another difference is that with
the previous version, one could display the text bounds in PDFDebugger and now
no more. I've also attached an unencrypted version, this one shows the same
problem with an unmodified ExtractText tool.
> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
> Key: PDFBOX-4550
> URL: https://issues.apache.org/jira/browse/PDFBOX-4550
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering, Text extraction
> Affects Versions: 2.0.15
> Reporter: Tilman Hausherr
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: PDFBOX-3442-DirectResources.pdf,
> PDFBOX-3442-DirectResources_unc.pdf, pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with
> corrupt contents in the beginbfrange segment where start and end have
> different lengths. This leads to poor performance. Such entries can be
> skipped.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]