[
https://issues.apache.org/jira/browse/FOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606914#comment-16606914
]
ffimbel commented on FOP-2811:
------------------------------
I cannot share the input file, I am using to reproduce the issue as it contains
confidential information.
If I try to impersonate the file, the hashcode will change and I would not be
able to reproduce the issue
Just to illustrate the case; I added the following line of code to keep track
of stream hashcodes in PDFDocumentHandler
{code:java}
private void setUpContents() throws IOException {
PDFStream stream = generator.getStream();
int hash = stream.streamHashCode();
//log hashcode
Files.write(Paths.get("hashes.txt"),("Page " +
getCurrentPage().getPageIndex() + " - stream length " + stream.getDataLength()
+ " - hashcode " + hash +
System.lineSeparator()).getBytes("UTF-8"),StandardOpenOption.CREATE,StandardOpenOption.APPEND);
if (!contents.containsKey(hash)) {
pdfDoc.registerObject(stream);
PDFReference ref = new PDFReference(stream);
contents.put(hash, ref);
}
currentPage.setContents(contents.get(hash));
}
{code}
With the input file i am using to reproduce the issue, the log illustrates that
two streams may have the same hashcode even if the length of each stream is
different
{code:java}
Page 0 - stream length 7119 - hashcode -1433696548
Page 1 - stream length 51188 - hashcode -610578584
Page 2 - stream length 24534 - hashcode -1811615548
Page 3 - stream length 110399 - hashcode -1014686270
Page 4 - stream length 46433 - hashcode 842088398
Page 5 - stream length 7120 - hashcode -1147221505
Page 6 - stream length 44046 - hashcode -1014686270
Page 7 - stream length 30253 - hashcode 993923731
Page 8 - stream length 115925 - hashcode -2088538797
Page 9 - stream length 112109 - hashcode -1370341963
{code}
See page 3 and 6.
Accordingly, in the output document page 6 (is replaced by page 3)
If I change the output format to pdf or afp, I do not have the issue.
> Deduplicate Pdf pages may cause erroneous pages rendering
> ---------------------------------------------------------
>
> Key: FOP-2811
> URL: https://issues.apache.org/jira/browse/FOP-2811
> Project: FOP
> Issue Type: Bug
> Components: renderer/pdf
> Affects Versions: 2.2
> Reporter: ffimbel
> Priority: Major
>
> Implementation of FOP-2647: Deduplicate PDF content stream compares pdf pages
> content using a hashcode which may not be unique. It causes rendering twice
> the same page in the final pdf document while actual pages are different
> (very inconvenient when the actual file contains documents for multiple
> recipients).
> Due to confidentiality reasons, I cannot share the test case we used to
> reproduce the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)