Xiaohan Zhang created PDFBOX-5737:
-------------------------------------
Summary: java.lang.ArrayIndexOutOfBoundsException Bug Report
Key: PDFBOX-5737
URL: https://issues.apache.org/jira/browse/PDFBOX-5737
Project: PDFBox
Issue Type: Bug
Affects Versions: 3.0.0 PDFBox
Reporter: Xiaohan Zhang
Attachments: crash-38ee70b5cb74519b642c150694f601239f492168
Recently we discovered a bug in latest pdfbox (3.0.0).
Due to the lack of contextual knowledge in the pdfbox library, we cannot
thoroughly fix some bugs hence we look forward to any proposed plan from the
developers in fixing these bugs.
# Test Program
package com.test;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.Loader;
public class Entry {
public static void main (String args[]) throws IOException {
assert args.length == 1;
try {
File file = new File(args[0]);
PDDocument document = Loader.loadPDF(file);
PDDocumentInformation pdd = document.getDocumentInformation();
System.out.println("Author of the document is :"+ pdd.getAuthor());
System.out.println("Title of the document is :"+ pdd.getTitle());
System.out.println("Subject of the document is :"+ pdd.getSubject());
int noOfPages= document.getNumberOfPages();
for (int i = 0; i < noOfPages; i++) {
PDPage page_doc = document.getPage(i);
System.out.println("Page:"+ i + ". Content: " +
page_doc.getContents());
}
PDFTextStripper pdfStripper = new PDFTextStripper();
String text = pdfStripper.getText(document);
System.out.println("Full Content:"+ text);
document.close();
} catch (java.io.IOException ignore) {
}
System.out.println("end test, no crash");
}
}
# POC file
See the attachments.
# Crash Stack
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy:
length -1 is negative
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
at org.apache.pdfbox.filter.CCITTFaxFilter.decode(CCITTFaxFilter.java:75)
at org.apache.pdfbox.filter.Filter.decode(Filter.java:96)
at org.apache.pdfbox.filter.Filter.decode(Filter.java:238)
at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:196)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:51)
at
org.apache.pdfbox.pdfparser.BruteForceParser.bfSearchForObjStreams(BruteForceParser.java:336)
at
org.apache.pdfbox.pdfparser.BruteForceParser.rebuildTrailer(BruteForceParser.java:838)
at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:250)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:127)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:156)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:466)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:348)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:303)
at org.apache.pdfbox.Loader.loadPDF(Loader.java:246)
at com.test.Entry.main(Entry.java:21)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]