[ 
https://issues.apache.org/jira/browse/FOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606914#comment-16606914
 ] 

ffimbel commented on FOP-2811:
------------------------------

I cannot share the input file, I am using to reproduce the issue as it contains 
confidential information.
If I try to impersonate the file, the hashcode will change and I would not be 
able to reproduce the issue

Just to illustrate the case; I added the following line of code to keep track 
of stream hashcodes in PDFDocumentHandler
{code:java}
private void setUpContents() throws IOException {
  PDFStream stream = generator.getStream();
  int hash = stream.streamHashCode();
  //log hashcode
  Files.write(Paths.get("hashes.txt"),("Page " + 
getCurrentPage().getPageIndex() + " - stream length " + stream.getDataLength() 
+ " - hashcode " +   hash + 
System.lineSeparator()).getBytes("UTF-8"),StandardOpenOption.CREATE,StandardOpenOption.APPEND);
  if (!contents.containsKey(hash)) {
  pdfDoc.registerObject(stream);
  PDFReference ref = new PDFReference(stream);
  contents.put(hash, ref);
  }
  currentPage.setContents(contents.get(hash));
}
{code}
With the input file i am using to reproduce the issue, the log illustrates that 
two streams may have the same hashcode even if the length of each stream is 
different
{code:java}
Page 0 - stream length 7119 - hashcode -1433696548
Page 1 - stream length 51188 - hashcode -610578584
Page 2 - stream length 24534 - hashcode -1811615548
Page 3 - stream length 110399 - hashcode -1014686270
Page 4 - stream length 46433 - hashcode 842088398
Page 5 - stream length 7120 - hashcode -1147221505
Page 6 - stream length 44046 - hashcode -1014686270
Page 7 - stream length 30253 - hashcode 993923731
Page 8 - stream length 115925 - hashcode -2088538797
Page 9 - stream length 112109 - hashcode -1370341963
{code}
See page 3 and 6.

Accordingly, in the output document page 6 (is replaced by page 3)

If I change the output format to pdf or afp, I do not have the issue.

 

 

> Deduplicate Pdf pages may cause erroneous pages rendering
> ---------------------------------------------------------
>
>                 Key: FOP-2811
>                 URL: https://issues.apache.org/jira/browse/FOP-2811
>             Project: FOP
>          Issue Type: Bug
>          Components: renderer/pdf
>    Affects Versions: 2.2
>            Reporter: ffimbel
>            Priority: Major
>
> Implementation of FOP-2647: Deduplicate PDF content stream compares pdf pages 
> content using a hashcode which may not be unique. It causes rendering twice 
> the same page in the final pdf document while actual pages are different 
> (very inconvenient when the actual file contains documents for multiple 
> recipients).
> Due to confidentiality reasons, I cannot share the test case we used to 
> reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to