[
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258941#comment-17258941
]
Ralf Hauser commented on PDFBOX-5068:
-------------------------------------
As the out-of-memory heap space happened in COSStream.java:218, I analyzed what
is happening there with the help of [^RandomAccessReadBufferDiag.java]
As the stack-trace shows, the prepareIncrement in COSWriter.write() appears to
be the problem.
i) With my test file "programWinter2015.pdf" (see PDFBOX-4297 for the signed
version of it), the approx 6000 Objects all have a size up to 30K. This in
itself is not a problem.
ii) As it looks that approx half of them are read, but likely one after another
and always sequentially except for a 1 byte rewind (so no or very little
RandomAccess jumping), this is not necessarily a memory problem. The problem
seems to be that the RandomAccessReadBuffer consumes its contents in the
constructor and not when it's content is really read.
1) So, it could be helpful refactoring this class into a "Lazy/OnDemand" way.
With that possibly ~50% of the prepareIncrement memory usage would disappear
since they seem to never be read in our signature use-case.
(This implies that the FlateFilter.decode is also done lazy in a (chained)
Inputstream)
2) If after reading, they would be immediately deallocated, this might even
lead to constant memory requirements for prepareIncrement() (this only applies
if iii) no single object is really large [so no embedded movies of GB size]
and iv) my assumption is correct that each object is only read once - I haven't
thoroughly checked that)
Being even more radical, the FileOutputStream in the end is not written using
the objectKeys map assembled with prepareIncrement but uses the
RandomAccessInputStream in COSWriter.writeExternalSignature() that hopefully is
constant memory too.
3) So, if we do not control whether the page size/-orientation where the
signature shall be placed is really correct, probably most of the
prepareIncrement parsing is not needed at all for signing (no clue what happens
if a visible signature is supposed to be placed outside a page ;) - or on page
20 if the document has only 10 pages)
> OutOfMemory while signing large documents - continued
> -----------------------------------------------------
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
> Issue Type: Improvement
> Components: Signing
> Affects Versions: 2.0.23
> Reporter: Ralf Hauser
> Priority: Major
> Attachments: RandomAccessReadBufferDiag.java
>
>
> Continuation of PDFBOX-2512
>
> in COSWriter.prepareIncrement(), for the test case
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys,
> cosDoc.getObjectFromPool() gets an object that is not just referencing some
> part of the input document, but duplicates it (which is unavoidable in the
> case where they are decompressed with FlateFilter - albeit this could
> possibly be done "lazy")
> -Xmx20m 746/5925
> -Xmx25m 1615/5925
> -Xmx30m 2800/5925
> -Xmx40m 3872/5925
> -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> This assumes the patch of PDFBOX-5067 already in place - or using
> CreateVisibleSignature2.java as starting point
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]