[
https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679435#comment-13679435
]
Andrew Dale commented on PDFBOX-1586:
-------------------------------------
Even though the bug has been set to fixed in the 1.8.2 release of PDFBox, it is
in my opinion still there. A simplified test case is:
@Test
public void testPdfBox2() throws Exception {
PDDocument returnDocument = new PDDocument();
String outputFilename = "/tmp/output.pdf";
List<Integer> pages = Arrays.asList(1, 2, 3, 4, 5);
try {
// get/load current document
PDDocument currentPdf = PDDocument.load(new File("/tmp/input.pdf"));
@SuppressWarnings("unchecked")
List<PDPage> currentDocumentPages =
currentPdf.getDocumentCatalog().getAllPages();
for (Integer currentPage : pages) {
returnDocument.importPage(currentDocumentPages.get(currentPage
- 1));
}
currentPdf.close(); // cause of the problem, and everything works
ok if this is closed after the returnDocument.save and returnDocument.close is
called.
} finally {
returnDocument.save(outputFilename);
returnDocument.close();
}
}
This gives me the following stacktrace:
org.apache.pdfbox.exceptions.COSVisitorException:
java.lang.IndexOutOfBoundsException: Index: 72, Size: 0
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1354)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:217)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206)
at
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:525)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1292)
at com.test.PdfBoxTest.testPdfBox2(PdfBoxTest.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.IndexOutOfBoundsException: Index: 72, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at
org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at
org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1337)
... 34 more
I am using JDK 1.6.0_33 on Linux 64-Bit (Ubuntu)
> IndexOutOfBoundsException when saving a document (at random)
> ------------------------------------------------------------
>
> Key: PDFBOX-1586
> URL: https://issues.apache.org/jira/browse/PDFBOX-1586
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.8.1
> Reporter: James Green
> Assignee: Andreas Lehmkühler
> Priority: Critical
> Fix For: 1.8.2
>
>
> Getting the following stacktrace:
> org.apache.pdfbox.exceptions.COSVisitorException:
> java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
> at
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245)
> at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201)
> at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206)
> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524)
> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056)
> at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496)
> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
> ...
> Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:604)
> at java.util.ArrayList.get(ArrayList.java:382)
> at
> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
> at
> org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232)
> I'll add some context. We have a "data pipeline" in which a Windows Print
> Monitor sends postscript into a servlet which then uses GhostScript 9.05 to
> convert in-memory to PDF. This PDF is then loaded into PDFBox using
> PDDocument.load().
> At this point we split the original PDF into multiple smaller ones each of
> which is saved to a ByteArrayOutputStream. At the point of save() we are
> having serious reliability issues.
> Taking an original PDF from Ghostscript we have saved this into a unit test
> to replicate the problem without success. If we attempt to re-execute the
> pipeline to take the original PDF and split it, we get apparently random
> percentages of saved documents.
> For instance, on a 990 page document (text, no images), to be split into 990
> 1-page documents using Tomcat 7 with -Xmx=512m:
> Pass 1: 50% were saved, 50% ended with stack traces
> Pass 2: 100% were saved
> Pass 3: 100% were saved
> The same test with -Xmx=128m ended several times with just 1 document saved,
> the rest were stack traces.
> We have also seen this randomly hit a sample document consisting of four
> pages to be split into two two-page documents so it does not appear to be
> memory related. We also added code to catch the IndexOutOfBoundsException and
> make up to ten attempts to repeat, but it seems the save() either works the
> first time or not at all.
> We're thinking there are environmental factors here but we're now focused on
> getting this nailed. Any advice or assistance will be welcomed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira