[
https://issues.apache.org/jira/browse/PDFBOX-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151546#comment-17151546
]
Tilman Hausherr commented on PDFBOX-4908:
-----------------------------------------
I tried a change (skip arrays and dictionaries) and it works, but then I looked
in the PDF specification and it could be that one of these dictionaries
contains an array. So maybe just skip dictionaries. But then I wonder, why are
these people putting stuff there? Should this weird extra data be kept, or just
dumped?
> PDFMergerUtility.mergeInto() does not deep copy metadata
> --------------------------------------------------------
>
> Key: PDFBOX-4908
> URL: https://issues.apache.org/jira/browse/PDFBOX-4908
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.18, 2.0.20
> Environment: Windows, JDK12
> Reporter: Tim Shaffer
> Priority: Minor
> Attachments: bad1.pdf, bad2.pdf, blank.pdf
>
>
> After merging two documents, closing the source document prevents the
> destination document from being saved.
> {code:java}
> // mainDoc can be any existing PDF
> PDDocument mainDoc = PDDocument.load(new File("blank.pdf"));
> PDDocument appendDoc = PDDocument.load(new File("bad1.pdf"));
> //PDDocument appendDoc = PDDocument.load(new File("bad2.pdf"));
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.appendDocument(mainDoc, appendDoc);
> appendDoc.close();
> // Exception thrown during save()
> mainDoc.save("temp.pdf");
> mainDoc.close();
> {code}
> Exception:
> {noformat}
> java.io.IOException: COSStream has been closed and cannot be read. Perhaps
> its enclosing PDDocument has been closed?
> at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
> at
> org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1219)
> at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:404)
> at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:526)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:464)
> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:448)
> at
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1113)
> at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:449)
> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1386)
> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1273)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1357)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1328)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1316)
> at Main.main(Main.java:60)
> {noformat}
> Attached are two different PDFs, from different sources, that both cause the
> bug. All sensitive data has been removed, so the PDFs only contain blank
> pages, but the structure is still present which causes the above Exception.
> Also attached is blank.pdf (another blank doc) that I've been testing with as
> the destination.
> The cause seems to be these lines in PDFMergerUtility:
> {code:java}
> PDDocumentInformation destInfo = destination.getDocumentInformation();
> PDDocumentInformation srcInfo = source.getDocumentInformation();
> mergeInto(srcInfo.getCOSObject(), destInfo.getCOSObject(),
> Collections.<COSName>emptySet());
> {code}
> I've tried altering the code to use PDFCloneUtility to clone the
> srcInfo.getCOSObject() before passing it to mergeInto(). That seems to fix
> the issue, but I'm not familiar enough with the code to say if that is the
> correct way to fix this.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]