[ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226501#comment-17226501
 ] 

Tilman Hausherr commented on PDFBOX-5009:
-----------------------------------------

I'm able to catch this by using a set to prevent a recursive call with the same 
parameter:
{code:java}
    private final class PageIterator implements Iterator<PDPage>
    {
        private final Queue<COSDictionary> queue = new ArrayDeque<>();
        private Set<COSDictionary> set = new HashSet<>();

        private PageIterator(COSDictionary node)
        {
            enqueueKids(node);
        }
        private void enqueueKids(COSDictionary node)
        {
            if (isPageTreeNode(node))
            {
                List<COSDictionary> kids = getKids(node);
                for (COSDictionary kid : kids)
                {

                    // ****** NEW **********
                    if (set.contains(kid))
                    {
                        LOG.error("This node has already been visited");
                        continue;
                    }
                    else
                    {
                        set.add(kid);
                    }

                    enqueueKids(kid);
                }
            }
            else
            {
                queue.add(node);
            }
        }
 {code}
 

> Corrupt PDF can lead to a StackOverflow
> ---------------------------------------
>
>                 Key: PDFBOX-5009
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5009
>             Project: PDFBox
>          Issue Type: Task
>          Components: Text extraction
>    Affects Versions: 2.0.21
>            Reporter: Tim Allison
>            Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to