[
https://issues.apache.org/jira/browse/PDFBOX-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070601#comment-14070601
]
Brandon Lyon edited comment on PDFBOX-2226 at 7/22/14 6:06 PM:
---------------------------------------------------------------
Edit: I used the PDF file attached to the link you provided, and I was using
the latest snapshot of 2.0 at the time
It could be a combination of things that caused the problem. Let me kind of
show you how I handle the documents, from loading the template to saving the
completed, merged document:
{code:java}
// File loaded into memory (essentially stored as a byte array, within a
Document object)
byte[] data;
// Data is read from the byte array using ByteArrayInputStream
try(InputStream is = new ByteArrayInputStream (is))
{
...
}
// There are two processes involved. Filling and merging. It's not clear which
one did the " ) Tj 0 - 13 Td (" thing, but merging is definitely where the
IndexOutOfBoundsException exception was occurring
// The first process fills the PDF form fields, and saves it to a new byte array
final Document template;
final Map<String, Object> parameters;
try (InputStream is = template.openInputStream ()) // openInputStream returns
ByteArrayInputStream for the typical Document implementation
{
final PDDocument document = PDDocument.load (is);
try
{
final PDDocumentCatalog catalog = document.getDocumentCatalog
();
final PDAcroForm form = catalog.getAcroForm ();
for (final Map.Entry<String, Object> e : parameters.entrySet ())
{
final PDField field = form.getField (e.getKey ());
if (field != null)
{
if (field instanceof PDCheckbox)
{
final Object value = e.getValue ();
final boolean boolValue = (value
instanceof Boolean) ? (Boolean) value : Boolean.parseBoolean (e.getValue
().toString ());
if (boolValue)
((PDCheckbox) field).check ();
else
((PDCheckbox) field).unCheck ();
}
else
field.setValue (e.getValue ().toString
());
}
}
document.save (os);
}
finally
{
document.close ();
}
}
catch (final IOException | COSVisitorException e)
{
throw new RuntimeException ("Exception thrown while populating PDF
field data", e);
}
// The second process reads from multiple byte arrays, one for each document,
merges them, and saves it to a new byte array
final List<InputStream> streamsToClose = new LinkedList<> ();
try
{
final PDFMergerUtility mergePdf = new PDFMergerUtility ();
for (final Document doc : this.parts)
{
Document pdf;
if (MimeType.MIME_COMPARATOR.compare (doc.getMimeType (),
"pdf") == 0)
pdf = doc;
else if (doc.isConvertableTo ("pdf"))
pdf = doc.convert ("pdf");
else
throw new DocumentConversionException ("Cannot merge
document '" + doc + "' with id '" + doc.getId () + "' and MIME type '"
+ doc.getMimeType () + "': Only PDF
documents allowed");
final InputStream is = pdf.openInputStream ();
streamsToClose.add (is);
mergePdf.addSource (is);
}
try (ByteArrayOutputStream os = new ByteArrayOutputStream ())
{
mergePdf.setDestinationStream (os);
///////////////////////////
// This is where the IndexOutOfBoundsException would occur
mergePdf.mergeDocuments ();
///////////////////////////
return new RawDocument (os.toByteArray (), "application/pdf",
null);
}
catch (final IOException | COSVisitorException e)
{
throw new DocumentConversionException ("Exception occured
during PDF document merge", e);
}
}
finally
{
final Iterator<InputStream> it = streamsToClose.iterator ();
while (it.hasNext ())
try (InputStream is = it.next ())
{}
catch (final IOException e)
{
throw new DocumentConversionException ("IOException
occured closing input stream", e);
}
}
// One the processes are complete, they are written to file using a
FileOutputStream wrapped in a BufferedOutputStream
{code}
was (Author: etherous):
It could be a combination of things that caused the problem. Let me kind of
show you how I handle the documents, from loading the template to saving the
completed, merged document:
{code:java}
// File loaded into memory (essentially stored as a byte array, within a
Document object)
byte[] data;
// Data is read from the byte array using ByteArrayInputStream
try(InputStream is = new ByteArrayInputStream (is))
{
...
}
// There are two processes involved. Filling and merging. It's not clear which
one did the " ) Tj 0 - 13 Td (" thing, but merging is definitely where the
IndexOutOfBoundsException exception was occurring
// The first process fills the PDF form fields, and saves it to a new byte array
final Document template;
final Map<String, Object> parameters;
try (InputStream is = template.openInputStream ()) // openInputStream returns
ByteArrayInputStream for the typical Document implementation
{
final PDDocument document = PDDocument.load (is);
try
{
final PDDocumentCatalog catalog = document.getDocumentCatalog
();
final PDAcroForm form = catalog.getAcroForm ();
for (final Map.Entry<String, Object> e : parameters.entrySet ())
{
final PDField field = form.getField (e.getKey ());
if (field != null)
{
if (field instanceof PDCheckbox)
{
final Object value = e.getValue ();
final boolean boolValue = (value
instanceof Boolean) ? (Boolean) value : Boolean.parseBoolean (e.getValue
().toString ());
if (boolValue)
((PDCheckbox) field).check ();
else
((PDCheckbox) field).unCheck ();
}
else
field.setValue (e.getValue ().toString
());
}
}
document.save (os);
}
finally
{
document.close ();
}
}
catch (final IOException | COSVisitorException e)
{
throw new RuntimeException ("Exception thrown while populating PDF
field data", e);
}
// The second process reads from multiple byte arrays, one for each document,
merges them, and saves it to a new byte array
final List<InputStream> streamsToClose = new LinkedList<> ();
try
{
final PDFMergerUtility mergePdf = new PDFMergerUtility ();
for (final Document doc : this.parts)
{
Document pdf;
if (MimeType.MIME_COMPARATOR.compare (doc.getMimeType (),
"pdf") == 0)
pdf = doc;
else if (doc.isConvertableTo ("pdf"))
pdf = doc.convert ("pdf");
else
throw new DocumentConversionException ("Cannot merge
document '" + doc + "' with id '" + doc.getId () + "' and MIME type '"
+ doc.getMimeType () + "': Only PDF
documents allowed");
final InputStream is = pdf.openInputStream ();
streamsToClose.add (is);
mergePdf.addSource (is);
}
try (ByteArrayOutputStream os = new ByteArrayOutputStream ())
{
mergePdf.setDestinationStream (os);
///////////////////////////
// This is where the IndexOutOfBoundsException would occur
mergePdf.mergeDocuments ();
///////////////////////////
return new RawDocument (os.toByteArray (), "application/pdf",
null);
}
catch (final IOException | COSVisitorException e)
{
throw new DocumentConversionException ("Exception occured
during PDF document merge", e);
}
}
finally
{
final Iterator<InputStream> it = streamsToClose.iterator ();
while (it.hasNext ())
try (InputStream is = it.next ())
{}
catch (final IOException e)
{
throw new DocumentConversionException ("IOException
occured closing input stream", e);
}
}
// One the processes are complete, they are written to file using a
FileOutputStream wrapped in a BufferedOutputStream
{code}
> IndexOutOfBoundsException when merging many PDFs in memory
> ----------------------------------------------------------
>
> Key: PDFBOX-2226
> URL: https://issues.apache.org/jira/browse/PDFBOX-2226
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.8.6
> Environment: Windows 7 64-bit, JDK8
> Reporter: Brandon Lyon
> Attachments: foo2_1_1.pdf, foo_1_1.pdf
>
>
> An IndexOutOfBoundsException occurs when attempting to merge many (at least
> 10) PDF documents together. All PDFs exist in byte arrays in memory, not as
> files. The stack trace looks as follows (irrelevant details redacted):
> 2014-07-18 11:48:22,858 ERROR [io.undertow.servlet] (default task-5) ****:
> Uncaught exception: : ****
> ****
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:267)
> [pdfbox-1.8.6.jar:]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1216)
> [pdfbox-1.8.6.jar:]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1183)
> [pdfbox-1.8.6.jar:]
> at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
> [pdfbox-1.8.6.jar:]
> at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
> [pdfbox-1.8.6.jar:]
> at ****
> ... 29 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 145, Size: 145
> at java.util.ArrayList.rangeCheck(ArrayList.java:638) [rt.jar:1.8.0_05]
> at java.util.ArrayList.get(ArrayList.java:414) [rt.jar:1.8.0_05]
> at
> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
> [pdfbox-1.8.6.jar:]
> at
> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
> [pdfbox-1.8.6.jar:]
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> [rt.jar:1.8.0_05]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> [rt.jar:1.8.0_05]
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> [rt.jar:1.8.0_05]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:634)
> [pdfbox-1.8.6.jar:]
> at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:609)
> [pdfbox-1.8.6.jar:]
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194)
> [pdfbox-1.8.6.jar:]
> ... 34 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)