[
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925503#comment-17925503
]
Tilman Hausherr commented on PDFBOX-5950:
-----------------------------------------
So I tried your change with the trunk (and likely with 3.0) by running
java -jar pdfbox-app-4.0.0-SNAPSHOT.jar merge ComSquare1.pdf Ghostscript1.pdf
res.pdf
then start PDFDebugger, choose "view", "show internal structure" and then look
for {{Info/ImPDF/Images/Kids/[0]}}, there's an image and it's missing.
With 2.0 I get an exception by merging from the command line like this:
java -jar pdfbox-app-2.0.34-SNAPSHOT.jar PDFMerger ComSquare1.pdf
Ghostscript1.pdf res20.pdf
{noformat}
Exception in thread "main" java.io.IOException: COSStream has been closed and
cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
at
org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1290)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:416)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:195)
at
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:570)
at
org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:496)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:480)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1184)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:455)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1457)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1344)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1381)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1353)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1337)
at
org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:488)
at
org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:349)
at org.apache.pdfbox.tools.PDFMerger.merge(PDFMerger.java:70)
at org.apache.pdfbox.tools.PDFMerger.main(PDFMerger.java:49)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:85)
{noformat}
> pdfbox PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
> Key: PDFBOX-5950
> URL: https://issues.apache.org/jira/browse/PDFBOX-5950
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.32, 3.0.0 PDFBox
> Environment: jdk11
> Reporter: asdpboy
> Priority: Major
> Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report
> it. Below are the details:
>
> When there are a large number of sources (e.g., thousands), the `tobeclosed`
> method will load the PDF document into memory. This may pose a risk of
> Out-of-Memory (OOM) during the merge process.
>
> The following adjustments can be made, close the sourceDoc object immediately
> .
>
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
> PDDocument sourceDoc = null;
> if (sourceObject instanceof File)
> {
> sourceDoc = PDDocument.load((File) sourceObject,
> partitionedMemSetting);
> }
> else
> {
> sourceDoc = PDDocument.load((InputStream) sourceObject,
> partitionedMemSetting);
> }
> try
> {
> appendDocument(destination, sourceDoc);
> }
> finally
> {
> IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
> }
> }
> {code}
> one of the oom cases
> !oom.png!
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage
> Collection) was triggered frequently, which can be observed from the CPU
> usage curve on the left.
> !before.png!
> After Modification: The heap memory is now able to be collected normally
> without causing an OOM.
> !after.png!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]