Owen McGovern created PDFBOX-5485:
-------------------------------------

             Summary: Stackoverflow writing out a subset of PDF pages - 
COSWriterObjectStream
                 Key: PDFBOX-5485
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5485
             Project: PDFBox
          Issue Type: Bug
          Components: Writing
    Affects Versions: 3.0.0 JBIG2
         Environment: MacOS, but likely not OS specific.
            Reporter: Owen McGovern


Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3

 

In a subset of PDFs I process, I cannot extract a range of PDF pages and write 
them out to a new PDF.   ( As part of test code )

Here's the Kotlin code I use 
{code:java}
fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
   val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
   val pdfPagesFile = Paths.get("data", "input", "PDFS", 
"${documentName}_Page_$fromPage-$toPage.pdf")        
   val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
   val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
   val pdfPages = pageExtractor.extract()
   pdfPages.save(pdfPagesFile.toFile())
   return pdfPagesFile
}{code}
It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 

 

The a slice of the stack trace is 
{code:java}
java.lang.StackOverflowError
    at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
    at java.base/java.util.HashMap.<init>(HashMap.java:453)
    at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:347)
    at java.base/java.util.HashSet.<init>(HashSet.java:162)
    at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:154)
    at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
    at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
    at 
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
 {code}
As I mentioned, hits some PDFs, not all.

I legally cannot share the original source PDFs but it looks like a recursive 
loop in writeCOSDictionary and writeObject in COSWriterObjectStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to