[ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258941#comment-17258941
 ] 

Ralf Hauser commented on PDFBOX-5068:
-------------------------------------

As the out-of-memory heap space happened in COSStream.java:218, I analyzed what 
is happening there with the help of [^RandomAccessReadBufferDiag.java]

As the stack-trace shows, the prepareIncrement in COSWriter.write() appears to 
be the problem.

i) With my test file "programWinter2015.pdf" (see PDFBOX-4297 for the signed 
version of it), the approx 6000 Objects all have a size up to 30K. This in 
itself is not a problem.

ii) As it looks that approx half of them are read, but likely one after another 
and always sequentially except for a 1 byte rewind (so no or very little 
RandomAccess jumping), this is not necessarily a memory problem. The problem 
seems to be that the RandomAccessReadBuffer consumes its contents in the 
constructor and not when it's content is really read.

1) So, it could be helpful refactoring this class into a "Lazy/OnDemand" way. 
With that possibly ~50% of the prepareIncrement memory usage would disappear 
since they seem to never be read in our signature use-case.
(This implies that the FlateFilter.decode is also done lazy in a (chained) 
Inputstream)

2) If after reading, they would be immediately deallocated, this might even 
lead to constant memory requirements for prepareIncrement() (this only applies 
if   iii) no single object is really large [so no embedded movies of GB size] 
and iv) my assumption is correct that each object is only read once - I haven't 
thoroughly checked that)

Being even more radical, the FileOutputStream in the end is not written using 
the objectKeys map assembled with prepareIncrement but uses the 
RandomAccessInputStream in COSWriter.writeExternalSignature() that hopefully is 
constant memory too.

3) So, if we do not control whether the page size/-orientation where the 
signature shall be placed is really correct, probably most of the 
prepareIncrement parsing is not needed at all for signing (no clue what happens 
if a visible signature is supposed to be placed outside a page ;) - or on page 
20 if the document has only 10  pages)

 

> OutOfMemory while signing large documents - continued
> -----------------------------------------------------
>
>                 Key: PDFBOX-5068
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5068
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Signing
>    Affects Versions: 2.0.23
>            Reporter: Ralf Hauser
>            Priority: Major
>         Attachments: RandomAccessReadBufferDiag.java
>
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
>  -Xmx25m 1615/5925
>  -Xmx30m 2800/5925
>  -Xmx40m 3872/5925
>  -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> This assumes the patch of PDFBOX-5067 already in place - or using 
> CreateVisibleSignature2.java as starting point



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to