[
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226501#comment-17226501
]
Tilman Hausherr commented on PDFBOX-5009:
-----------------------------------------
I'm able to catch this by using a set to prevent a recursive call with the same
parameter:
{code:java}
private final class PageIterator implements Iterator<PDPage>
{
private final Queue<COSDictionary> queue = new ArrayDeque<>();
private Set<COSDictionary> set = new HashSet<>();
private PageIterator(COSDictionary node)
{
enqueueKids(node);
}
private void enqueueKids(COSDictionary node)
{
if (isPageTreeNode(node))
{
List<COSDictionary> kids = getKids(node);
for (COSDictionary kid : kids)
{
// ****** NEW **********
if (set.contains(kid))
{
LOG.error("This node has already been visited");
continue;
}
else
{
set.add(kid);
}
enqueueKids(kid);
}
}
else
{
queue.add(node);
}
}
{code}
> Corrupt PDF can lead to a StackOverflow
> ---------------------------------------
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
> Issue Type: Task
> Components: Text extraction
> Affects Versions: 2.0.21
> Reporter: Tim Allison
> Priority: Minor
>
> See TIKA-3224. I confirmed this with 2.0.21 by calling the app's ExtractText
> on the file posted on the Tika issue.
> cc [~dadoonet]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]