[ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226486#comment-17226486
 ] 

Tilman Hausherr commented on PDFBOX-5009:
-----------------------------------------

I added some logging and stack tracing to see when it starts:
{noformat}
020-11-05 05:19:14 WARN  PDPageTree:154 - i = 4, element is: COSObject{207, 0}
2020-11-05 05:19:14 WARN  PDPageTree:155 - COSDictionary expected, but got null
java.lang.Exception
        at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:157)
        at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:41)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:184)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.<init>(PDPageTree.java:173)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.<init>(PDPageTree.java:167)
        at org.apache.pdfbox.pdmodel.PDPageTree.iterator(PDPageTree.java:126)
        at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:289)
        at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:241)
        at 
org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:364)
        at 
org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:267)
        at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:98)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:57)
2020-11-05 05:19:14 WARN  PDPageTree:154 - i = 5, element is: COSObject{214, 0}
2020-11-05 05:19:14 WARN  PDPageTree:155 - COSDictionary expected, but got null
java.lang.Exception
        at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:157)
        at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:41)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:184)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.<init>(PDPageTree.java:173)
        at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.<init>(PDPageTree.java:167)
        at org.apache.pdfbox.pdmodel.PDPageTree.iterator(PDPageTree.java:126)
        at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:289)
        at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:241)
        at 
org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:364)
        at 
org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:267)
        at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:98)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:57) {noformat}

> Corrupt PDF can lead to a StackOverflow
> ---------------------------------------
>
>                 Key: PDFBOX-5009
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5009
>             Project: PDFBox
>          Issue Type: Task
>          Components: Text extraction
>    Affects Versions: 2.0.21
>            Reporter: Tim Allison
>            Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to