[jira] [Comment Edited] (PDFBOX-2226) IndexOutOfBoundsException when merging many PDFs in memory

Brandon Lyon (JIRA) Tue, 22 Jul 2014 11:10:57 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070601#comment-14070601
 ]


Brandon Lyon edited comment on PDFBOX-2226 at 7/22/14 6:09 PM:
---------------------------------------------------------------

Edit: I used the PDF file attached to the link you provided, and I was using 
the latest snapshot of 2.0 at the time. Yes, it's functioning perfect in the 
latest 1.8.7 snapshot, and works without exception but with corrupted multiline 
content in 2.0. A regression maybe?

It could be a combination of things that caused the problem. Let me kind of 
show you how I handle the documents, from loading the template to saving the 
completed, merged document:
{code:java}
// File loaded into memory (essentially stored as a byte array, within a 
Document object)
byte[] data;

// Data is read from the byte array using ByteArrayInputStream
try(InputStream is = new ByteArrayInputStream (is))
{
    ...
}

// There are two processes involved. Filling and merging. It's not clear which 
one did the " ) Tj 0 - 13 Td (" thing, but merging is definitely where the 
IndexOutOfBoundsException exception was occurring

// The first process fills the PDF form fields, and saves it to a new byte array
final Document template;
final Map<String, Object> parameters;
try (InputStream is = template.openInputStream ()) // openInputStream returns 
ByteArrayInputStream for the typical Document implementation
{
        final PDDocument document = PDDocument.load (is);
        try
        {
                final PDDocumentCatalog catalog = document.getDocumentCatalog 
();
                final PDAcroForm form = catalog.getAcroForm ();
                for (final Map.Entry<String, Object> e : parameters.entrySet ())
                {
                        final PDField field = form.getField (e.getKey ());
                        if (field != null)
                        {
                                if (field instanceof PDCheckbox)
                                {
                                        final Object value = e.getValue ();
                                        final boolean boolValue = (value 
instanceof Boolean) ? (Boolean) value : Boolean.parseBoolean (e.getValue 
().toString ());
                                        if (boolValue)
                                                ((PDCheckbox) field).check ();
                                        else
                                                ((PDCheckbox) field).unCheck ();
                                }
                                else
                                        field.setValue (e.getValue ().toString 
());
                        }
                }
                document.save (os);
        }
        finally
        {
                document.close ();
        }
}
catch (final IOException | COSVisitorException e)
{
        throw new RuntimeException ("Exception thrown while populating PDF 
field data", e);
}

// The second process reads from multiple byte arrays, one for each document, 
merges them, and saves it to a new byte array
final List<InputStream> streamsToClose = new LinkedList<> ();
try
{
        final PDFMergerUtility mergePdf = new PDFMergerUtility ();
        for (final Document doc : this.parts)
        {
                Document pdf;
                if (MimeType.MIME_COMPARATOR.compare (doc.getMimeType (), 
"pdf") == 0)
                        pdf = doc;
                else if (doc.isConvertableTo ("pdf"))
                        pdf = doc.convert ("pdf");
                else
                        throw new DocumentConversionException ("Cannot merge 
document '" + doc + "' with id '" + doc.getId () + "' and MIME type '"
                                        + doc.getMimeType () + "': Only PDF 
documents allowed");
                final InputStream is = pdf.openInputStream ();
                streamsToClose.add (is);
                mergePdf.addSource (is);
        }
        try (ByteArrayOutputStream os = new ByteArrayOutputStream ())
        {
                mergePdf.setDestinationStream (os);
                
                ///////////////////////////
                // This is where the IndexOutOfBoundsException would occur
                mergePdf.mergeDocuments ();
                ///////////////////////////
                
                return new RawDocument (os.toByteArray (), "application/pdf", 
null);
        }
        catch (final IOException | COSVisitorException e)
        {
                throw new DocumentConversionException ("Exception occured 
during PDF document merge", e);
        }
}
finally
{
        final Iterator<InputStream> it = streamsToClose.iterator ();
        while (it.hasNext ())
                try (InputStream is = it.next ())
                {}
                catch (final IOException e)
                {
                        throw new DocumentConversionException ("IOException 
occured closing input stream", e);
                }
}

// One the processes are complete, they are written to file using a 
FileOutputStream wrapped in a BufferedOutputStream
{code}


was (Author: etherous):
Edit: I used the PDF file attached to the link you provided, and I was using 
the latest snapshot of 2.0 at the time

It could be a combination of things that caused the problem. Let me kind of 
show you how I handle the documents, from loading the template to saving the 
completed, merged document:
{code:java}
// File loaded into memory (essentially stored as a byte array, within a 
Document object)
byte[] data;

// Data is read from the byte array using ByteArrayInputStream
try(InputStream is = new ByteArrayInputStream (is))
{
    ...
}

// There are two processes involved. Filling and merging. It's not clear which 
one did the " ) Tj 0 - 13 Td (" thing, but merging is definitely where the 
IndexOutOfBoundsException exception was occurring

// The first process fills the PDF form fields, and saves it to a new byte array
final Document template;
final Map<String, Object> parameters;
try (InputStream is = template.openInputStream ()) // openInputStream returns 
ByteArrayInputStream for the typical Document implementation
{
        final PDDocument document = PDDocument.load (is);
        try
        {
                final PDDocumentCatalog catalog = document.getDocumentCatalog 
();
                final PDAcroForm form = catalog.getAcroForm ();
                for (final Map.Entry<String, Object> e : parameters.entrySet ())
                {
                        final PDField field = form.getField (e.getKey ());
                        if (field != null)
                        {
                                if (field instanceof PDCheckbox)
                                {
                                        final Object value = e.getValue ();
                                        final boolean boolValue = (value 
instanceof Boolean) ? (Boolean) value : Boolean.parseBoolean (e.getValue 
().toString ());
                                        if (boolValue)
                                                ((PDCheckbox) field).check ();
                                        else
                                                ((PDCheckbox) field).unCheck ();
                                }
                                else
                                        field.setValue (e.getValue ().toString 
());
                        }
                }
                document.save (os);
        }
        finally
        {
                document.close ();
        }
}
catch (final IOException | COSVisitorException e)
{
        throw new RuntimeException ("Exception thrown while populating PDF 
field data", e);
}

// The second process reads from multiple byte arrays, one for each document, 
merges them, and saves it to a new byte array
final List<InputStream> streamsToClose = new LinkedList<> ();
try
{
        final PDFMergerUtility mergePdf = new PDFMergerUtility ();
        for (final Document doc : this.parts)
        {
                Document pdf;
                if (MimeType.MIME_COMPARATOR.compare (doc.getMimeType (), 
"pdf") == 0)
                        pdf = doc;
                else if (doc.isConvertableTo ("pdf"))
                        pdf = doc.convert ("pdf");
                else
                        throw new DocumentConversionException ("Cannot merge 
document '" + doc + "' with id '" + doc.getId () + "' and MIME type '"
                                        + doc.getMimeType () + "': Only PDF 
documents allowed");
                final InputStream is = pdf.openInputStream ();
                streamsToClose.add (is);
                mergePdf.addSource (is);
        }
        try (ByteArrayOutputStream os = new ByteArrayOutputStream ())
        {
                mergePdf.setDestinationStream (os);
                
                ///////////////////////////
                // This is where the IndexOutOfBoundsException would occur
                mergePdf.mergeDocuments ();
                ///////////////////////////
                
                return new RawDocument (os.toByteArray (), "application/pdf", 
null);
        }
        catch (final IOException | COSVisitorException e)
        {
                throw new DocumentConversionException ("Exception occured 
during PDF document merge", e);
        }
}
finally
{
        final Iterator<InputStream> it = streamsToClose.iterator ();
        while (it.hasNext ())
                try (InputStream is = it.next ())
                {}
                catch (final IOException e)
                {
                        throw new DocumentConversionException ("IOException 
occured closing input stream", e);
                }
}

// One the processes are complete, they are written to file using a 
FileOutputStream wrapped in a BufferedOutputStream
{code}

> IndexOutOfBoundsException when merging many PDFs in memory
> ----------------------------------------------------------
>
>                 Key: PDFBOX-2226
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2226
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.6
>         Environment: Windows 7 64-bit, JDK8
>            Reporter: Brandon Lyon
>         Attachments: foo2_1_1.pdf, foo_1_1.pdf
>
>
> An IndexOutOfBoundsException occurs when attempting to merge many (at least 
> 10) PDF documents together. All PDFs exist in byte arrays in memory, not as 
> files. The stack trace looks as follows (irrelevant details redacted):
> 2014-07-18 11:48:22,858 ERROR [io.undertow.servlet] (default task-5) ****: 
> Uncaught exception: : ****
>       ****
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:267) 
> [pdfbox-1.8.6.jar:]
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1216) 
> [pdfbox-1.8.6.jar:]
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1183) 
> [pdfbox-1.8.6.jar:]
>       at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:236)
>  [pdfbox-1.8.6.jar:]
>       at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:185)
>  [pdfbox-1.8.6.jar:]
>       at ****
>       ... 29 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 145, Size: 145
>       at java.util.ArrayList.rangeCheck(ArrayList.java:638) [rt.jar:1.8.0_05]
>       at java.util.ArrayList.get(ArrayList.java:414) [rt.jar:1.8.0_05]
>       at 
> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110) 
> [pdfbox-1.8.6.jar:]
>       at 
> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
>  [pdfbox-1.8.6.jar:]
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> [rt.jar:1.8.0_05]
>       at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> [rt.jar:1.8.0_05]
>       at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
> [rt.jar:1.8.0_05]
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:634) 
> [pdfbox-1.8.6.jar:]
>       at 
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:609) 
> [pdfbox-1.8.6.jar:]
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194) 
> [pdfbox-1.8.6.jar:]
>       ... 34 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2226) IndexOutOfBoundsException when merging many PDFs in memory

Reply via email to