[ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226666#comment-17226666
 ] 

Andreas Lehmkühler commented on PDFBOX-5009:
--------------------------------------------

[~tilman] Looks good to me, just one small improvement for pdfs consisting of a 
lot of pages. To minimize the number of elements within the set, it should be 
sufficient to store the page tree nodes:
{code}
                    if (set.contains(kid))
                    {
                        LOG.error("This node has already been visited");
                        continue;
                    }
                    else if (kid.containsKey(COSName.KIDS))
                    {
                        set.add(kid);
                    }
{code}


> Corrupt PDF can lead to a StackOverflow
> ---------------------------------------
>
>                 Key: PDFBOX-5009
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5009
>             Project: PDFBox
>          Issue Type: Task
>          Components: Text extraction
>    Affects Versions: 2.0.21
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.0.22, 3.0.0 PDFBox
>
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to