[jira] [Commented] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream

Tilman Hausherr (JIRA) Mon, 20 May 2019 09:59:38 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844111#comment-16844111
 ]


Tilman Hausherr commented on PDFBOX-4550:
-----------------------------------------

 [^PDFBOX-3442-DirectResources_unc.pdf] I have that file in the regression 
tests of PDFTextStripper (in {{pdfbox/src/test/resources/input}}), so the 
stripper is called directly without the check whether extraction is allowed. In 
the past it produced a text, and after the change it no longer does. The reason 
is that the interval is larger than 255 values. Another difference is that with 
the previous version, one could display the text bounds in PDFDebugger and now 
no more. I've also attached an unencrypted version, this one shows the same 
problem with an unmodified ExtractText tool.

> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
>                 Key: PDFBOX-4550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4550
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering, Text extraction
>    Affects Versions: 2.0.15
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.16, 3.0.0 PDFBox
>
>         Attachments: PDFBOX-3442-DirectResources.pdf, 
> PDFBOX-3442-DirectResources_unc.pdf, pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with 
> corrupt contents in the beginbfrange segment where start and end have 
> different lengths. This leads to poor performance. Such entries can be 
> skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream

Reply via email to