[
https://issues.apache.org/jira/browse/PDFBOX-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163804#comment-14163804
]
Tilman Hausherr edited comment on PDFBOX-2401 at 10/8/14 5:16 PM:
------------------------------------------------------------------
I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof
"invisible" in PDFDebugger because it is made of 0 and FF. In the original
"13-17" file, one starts with 000000FF00FFFF and another starts with
000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is
is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A
look into COSString shows that a String compare is done. Further trace shows
that these different strings are considered identical by java.
Here the relevant debug code I used, the first line is existing code, it is run
when the cloner believes that the object already exists in his map of already
cloned objects.
{code}
//we are done, it has already been converted.
if (base instanceof COSString)
{
System.out.println("WTF!?");
COSString str1 = (COSString) base;
System.out.println("c1: " + str1.getHexString().substring(0, 20));
System.out.println("c1 hash: " + str1.hashCode());
System.out.println("c1 str hash: " + str1.getString().hashCode());
System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
COSString str2 = (COSString) clonedVersion.get(base);
System.out.println("c2: " + str2.getHexString().substring(0, 20));
System.out.println("c2 hash: " + str2.hashCode());
System.out.println("c2 str hash: " + str2.getString().hashCode());
System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
System.out.println("are they equal? " +
str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}
1st solution: don't make a string compare if forceHexForm is true. Doesn't
work, because forceHexForm is only set in signatures.
2nd solution: force hex comparison => merge works
3rd solution: remember if the string has weird content => not possible,
COSStrings can be changed after construction.
4th solution: use byte compare => merge works
I commited solution 4.
was (Author: tilman):
I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof
"invisible" in PDFDebugger because it is made of 0 and FF. In the original
"13-17" file, one starts with 000000FF00FFFF and another starts with
000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is
is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A
look into COSString shows that a String compare is done. Further trace shows
that these different strings are considered identical by java.
Here the relevant debug code I used, the first line is existing code, it is run
when the cloner believes that the object already exists in his map of already
cloned objects.
{code}
//we are done, it has already been converted.
if (base instanceof COSString)
{
System.out.println("WTF!?");
COSString str1 = (COSString) base;
System.out.println("c1: " + str1.getHexString().substring(0, 20));
System.out.println("c1 hash: " + str1.hashCode());
System.out.println("c1 str hash: " + str1.getString().hashCode());
System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
COSString str2 = (COSString) clonedVersion.get(base);
System.out.println("c2: " + str2.getHexString().substring(0, 20));
System.out.println("c2 hash: " + str2.hashCode());
System.out.println("c2 str hash: " + str2.getString().hashCode());
System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
System.out.println("are they equal? " +
str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}
1st solution: don't make a string compare if forceHexForm is true. Doesn't
work, because forceHexForm is only set in signatures.
2nd solution: force hex comparison => merge works
3rd solution: remember if the string has weird content => not possible,
COSStrings can be changed after construction.
4th solution: use byte compare => merge works
I'll run the rendering tests and then I'll commit solution 4.
> Image has wrong colors after Merge
> ----------------------------------
>
> Key: PDFBOX-2401
> URL: https://issues.apache.org/jira/browse/PDFBOX-2401
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel, Utilities
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Attachments: michael levine.pdf, p13-17.pdf, p13-17double.pdf
>
>
> Marc Davis fronm the user mailing list has provided a file (michael
> levine.pdf) that, when merged with another file, has a black image on page 17
> ("TL-9"). I tried to investigate / narrow this somewhat:
> - it happens with any other file, or just use the michael levine file twice
> - extracting p17 with PDFSplit and then merging the result doesn't do it
> - extracting p1-17 with PDFSplit and then merging the result does do it
> - extracting p13-17 with PDFSplit and then merging the result does do it,
> altthough the black is now at the first page
> The page is not really "black", the colors are incorrect.
> That's all I found out until now. I compared the two files with PDFDebugger
> and can't see any obvious differences. I looked into the files with
> NOTEPAD++, there are some differences like that the colorspace is now
> indirect.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)