Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

Maruan Sahyoun Wed, 23 Jan 2013 01:28:29 -0800

Hi Manoj,

the size alone is not the cause of the issue. In a recent project we were 
handling PDF's larger than the one you are talking about.


1. Can you test with the Non Sequential Parser i.e. PDDocument.loadNonSeq(…) 
and confirm that this is causing the same issue.
2. Can you upload a sample PDF which enables us to reproduce the issue? Without 
that it will be very difficult to say why this is happening.
3. Of course you can try with larger heap settings until it works but I don't 
think this is a good approach.

In addition to that it would be good if you could describe what you want to 
achieve with the PDF. Maybe there are ways doing so without parsing the 
complete file.

With kind regards

Maruan Sahyoun


Am 23.01.2013 um 10:18 schrieb "Manoj Patel (JIRA)" <[email protected]>:

> 
>    [ 
> https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560504#comment-13560504
>  ] 
> 
> Manoj Patel commented on PDFBOX-1498:
> -------------------------------------
> 
> Sorry but i cannot share document with anyone. I have created new document 
> which is around 700mb. Now when i try  same program it is giving below Java 
> heap space exception, even i have set -Xmx1024 parameter for that
> 
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>       at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
>       at 
> org.apache.pdfbox.cos.COSStream.createFilteredStream(COSStream.java:415)
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:452)
>       at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>       ... 3 more
> 
> Is there any way to read it?
> 
>> Index Out Of Bounds Exception while reading large PDF Document 
>> ---------------------------------------------------------------
>> 
>>                Key: PDFBOX-1498
>>                URL: https://issues.apache.org/jira/browse/PDFBOX-1498
>>            Project: PDFBox
>>         Issue Type: Bug
>>           Reporter: Manoj Patel
>>           Assignee: Andreas Lehmkühler
>> 
>> I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
>> document (800 mb). 
>> Below is the full stack
>> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>>      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
>>      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>>      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>>      at imageData.AddFooter.main(AddFooter.java:26)
>> Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
>>      at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>      at java.util.ArrayList.get(ArrayList.java:322)
>>      at 
>> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
>>      at 
>> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
>>      at 
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>>      at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
>>      at 
>> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
>>      at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
>>      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>>      ... 3 more
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

Reply via email to