[ https://issues.apache.org/jira/browse/FOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606914#comment-16606914 ]
ffimbel commented on FOP-2811: ------------------------------ I cannot share the input file, I am using to reproduce the issue as it contains confidential information. If I try to impersonate the file, the hashcode will change and I would not be able to reproduce the issue Just to illustrate the case; I added the following line of code to keep track of stream hashcodes in PDFDocumentHandler {code:java} private void setUpContents() throws IOException { PDFStream stream = generator.getStream(); int hash = stream.streamHashCode(); //log hashcode Files.write(Paths.get("hashes.txt"),("Page " + getCurrentPage().getPageIndex() + " - stream length " + stream.getDataLength() + " - hashcode " + hash + System.lineSeparator()).getBytes("UTF-8"),StandardOpenOption.CREATE,StandardOpenOption.APPEND); if (!contents.containsKey(hash)) { pdfDoc.registerObject(stream); PDFReference ref = new PDFReference(stream); contents.put(hash, ref); } currentPage.setContents(contents.get(hash)); } {code} With the input file i am using to reproduce the issue, the log illustrates that two streams may have the same hashcode even if the length of each stream is different {code:java} Page 0 - stream length 7119 - hashcode -1433696548 Page 1 - stream length 51188 - hashcode -610578584 Page 2 - stream length 24534 - hashcode -1811615548 Page 3 - stream length 110399 - hashcode -1014686270 Page 4 - stream length 46433 - hashcode 842088398 Page 5 - stream length 7120 - hashcode -1147221505 Page 6 - stream length 44046 - hashcode -1014686270 Page 7 - stream length 30253 - hashcode 993923731 Page 8 - stream length 115925 - hashcode -2088538797 Page 9 - stream length 112109 - hashcode -1370341963 {code} See page 3 and 6. Accordingly, in the output document page 6 (is replaced by page 3) If I change the output format to pdf or afp, I do not have the issue. > Deduplicate Pdf pages may cause erroneous pages rendering > --------------------------------------------------------- > > Key: FOP-2811 > URL: https://issues.apache.org/jira/browse/FOP-2811 > Project: FOP > Issue Type: Bug > Components: renderer/pdf > Affects Versions: 2.2 > Reporter: ffimbel > Priority: Major > > Implementation of FOP-2647: Deduplicate PDF content stream compares pdf pages > content using a hashcode which may not be unique. It causes rendering twice > the same page in the final pdf document while actual pages are different > (very inconvenient when the actual file contains documents for multiple > recipients). > Due to confidentiality reasons, I cannot share the test case we used to > reproduce the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)