[jira] [Created] (PDFBOX-5737) java.lang.ArrayIndexOutOfBoundsException Bug Report

Xiaohan Zhang (Jira) Wed, 13 Dec 2023 02:18:05 -0800

Xiaohan Zhang created PDFBOX-5737:
-------------------------------------

             Summary: java.lang.ArrayIndexOutOfBoundsException Bug Report
                 Key: PDFBOX-5737
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5737
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 3.0.0 PDFBox
            Reporter: Xiaohan Zhang
         Attachments: crash-38ee70b5cb74519b642c150694f601239f492168


Recently we discovered a bug in latest pdfbox (3.0.0).
Due to the lack of contextual knowledge in the pdfbox library, we cannot 
thoroughly fix some bugs hence we look forward to any proposed plan from the 
developers in fixing these bugs.
 
# Test Program
 
package com.test;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.Loader;
 
public class Entry {
   public static void main (String args[]) throws IOException {
      assert args.length == 1;
      try {
           File file = new File(args[0]);
           PDDocument document = Loader.loadPDF(file);
           PDDocumentInformation pdd = document.getDocumentInformation();
           System.out.println("Author of the document is :"+ pdd.getAuthor());
           System.out.println("Title of the document is :"+ pdd.getTitle());
           System.out.println("Subject of the document is :"+ pdd.getSubject());
           int noOfPages= document.getNumberOfPages();
           for (int i = 0; i < noOfPages; i++) {
               PDPage page_doc = document.getPage(i);
               System.out.println("Page:"+ i + ". Content: " + 
page_doc.getContents());
           }
           PDFTextStripper pdfStripper = new PDFTextStripper();
           String text = pdfStripper.getText(document);
           System.out.println("Full Content:"+ text);
           document.close();
      } catch (java.io.IOException ignore) {
      }
      System.out.println("end test, no crash");
   }
}
 
# POC file
See the attachments.
 
# Crash Stack
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: 
length -1 is negative
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
at org.apache.pdfbox.filter.CCITTFaxFilter.decode(CCITTFaxFilter.java:75)
at org.apache.pdfbox.filter.Filter.decode(Filter.java:96)
at org.apache.pdfbox.filter.Filter.decode(Filter.java:238)
at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:196)
at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:51)
at 
org.apache.pdfbox.pdfparser.BruteForceParser.bfSearchForObjStreams(BruteForceParser.java:336)
at 
org.apache.pdfbox.pdfparser.BruteForceParser.rebuildTrailer(BruteForceParser.java:838)
at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:250)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:127)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:156)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:466)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:348)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:303)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:246)
at com.test.Entry.main(Entry.java:21)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (PDFBOX-5737) java.lang.ArrayIndexOutOfBoundsException Bug Report

Reply via email to