[ 
https://issues.apache.org/jira/browse/PDFBOX-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151546#comment-17151546
 ] 

Tilman Hausherr edited comment on PDFBOX-4908 at 7/5/20, 11:17 AM:
-------------------------------------------------------------------

I tried a change (skip arrays and dictionaries) and it works, but then I looked 
in the PDF specification and it could be that one of these dictionaries 
(ViewerPreferences) contains an array legally. So maybe just skip dictionaries. 
But then I wonder, why are these people putting stuff there? Should this weird 
extra data be kept, or just dumped?


was (Author: tilman):
I tried a change (skip arrays and dictionaries) and it works, but then I looked 
in the PDF specification and it could be that one of these dictionaries 
contains an array. So maybe just skip dictionaries. But then I wonder, why are 
these people putting stuff there? Should this weird extra data be kept, or just 
dumped?

> PDFMergerUtility.mergeInto() does not deep copy metadata
> --------------------------------------------------------
>
>                 Key: PDFBOX-4908
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4908
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.18, 2.0.20
>         Environment: Windows, JDK12
>            Reporter: Tim Shaffer
>            Priority: Minor
>         Attachments: bad1.pdf, bad2.pdf, blank.pdf
>
>
> After merging two documents, closing the source document prevents the 
> destination document from being saved.
> {code:java}
> // mainDoc can be any existing PDF
> PDDocument mainDoc = PDDocument.load(new File("blank.pdf"));
> PDDocument appendDoc = PDDocument.load(new File("bad1.pdf"));
> //PDDocument appendDoc = PDDocument.load(new File("bad2.pdf"));
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.appendDocument(mainDoc, appendDoc);
> appendDoc.close();
> // Exception thrown during save()
> mainDoc.save("temp.pdf");
> mainDoc.close();
> {code}
> Exception:
> {noformat}
> java.io.IOException: COSStream has been closed and cannot be read. Perhaps 
> its enclosing PDDocument has been closed?
>       at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
>       at 
> org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1219)
>       at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:404)
>       at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:526)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:464)
>       at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:448)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1113)
>       at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:449)
>       at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1386)
>       at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1273)
>       at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1357)
>       at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1328)
>       at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1316)
>       at Main.main(Main.java:60)
> {noformat}
> Attached are two different PDFs, from different sources, that both cause the 
> bug.  All sensitive data has been removed, so the PDFs only contain blank 
> pages, but the structure is still present which causes the above Exception.  
> Also attached is blank.pdf (another blank doc) that I've been testing with as 
> the destination.
> The cause seems to be these lines in PDFMergerUtility:
> {code:java}
>  PDDocumentInformation destInfo = destination.getDocumentInformation();
>  PDDocumentInformation srcInfo = source.getDocumentInformation();
>  mergeInto(srcInfo.getCOSObject(), destInfo.getCOSObject(), 
> Collections.<COSName>emptySet());
> {code}
> I've tried altering the code to use PDFCloneUtility to clone the 
> srcInfo.getCOSObject() before passing it to mergeInto().  That seems to fix 
> the issue, but I'm not familiar enough with the code to say if that is the 
> correct way to fix this.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to