[ 
https://issues.apache.org/jira/browse/PDFBOX-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163804#comment-14163804
 ] 

Tilman Hausherr edited comment on PDFBOX-2401 at 10/8/14 5:16 PM:
------------------------------------------------------------------

I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof 
"invisible" in PDFDebugger because it is made of 0 and FF. In the original 
"13-17" file, one starts with 000000FF00FFFF and another starts with 
000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is 
is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A 
look into COSString shows that a String compare is done. Further trace shows 
that these different strings are considered identical by java.

Here the relevant debug code I used, the first line is existing code, it is run 
when the cloner believes that the object already exists in his map of already 
cloned objects.
{code}
 //we are done, it has already been converted.
if (base instanceof COSString)
{
    System.out.println("WTF!?");
    COSString str1 = (COSString) base;
    System.out.println("c1: " + str1.getHexString().substring(0, 20));
    System.out.println("c1 hash: " + str1.hashCode());
    System.out.println("c1 str hash: " + str1.getString().hashCode());
    System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
    COSString str2 = (COSString) clonedVersion.get(base);
    System.out.println("c2: " + str2.getHexString().substring(0, 20));
    System.out.println("c2 hash: " + str2.hashCode());
    System.out.println("c2 str hash: " + str2.getString().hashCode());
    System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
    System.out.println("are they equal? " + 
str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}

1st solution: don't make a string compare if forceHexForm is true. Doesn't 
work, because forceHexForm is only set in signatures.

2nd solution: force hex comparison => merge works

3rd solution: remember if the string has weird content => not possible, 
COSStrings can be changed after construction.

4th solution: use byte compare => merge works

I commited solution 4.


was (Author: tilman):
I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof 
"invisible" in PDFDebugger because it is made of 0 and FF. In the original 
"13-17" file, one starts with 000000FF00FFFF and another starts with 
000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is 
is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A 
look into COSString shows that a String compare is done. Further trace shows 
that these different strings are considered identical by java.

Here the relevant debug code I used, the first line is existing code, it is run 
when the cloner believes that the object already exists in his map of already 
cloned objects.
{code}
 //we are done, it has already been converted.
if (base instanceof COSString)
{
    System.out.println("WTF!?");
    COSString str1 = (COSString) base;
    System.out.println("c1: " + str1.getHexString().substring(0, 20));
    System.out.println("c1 hash: " + str1.hashCode());
    System.out.println("c1 str hash: " + str1.getString().hashCode());
    System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
    COSString str2 = (COSString) clonedVersion.get(base);
    System.out.println("c2: " + str2.getHexString().substring(0, 20));
    System.out.println("c2 hash: " + str2.hashCode());
    System.out.println("c2 str hash: " + str2.getString().hashCode());
    System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
    System.out.println("are they equal? " + 
str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}

1st solution: don't make a string compare if forceHexForm is true. Doesn't 
work, because forceHexForm is only set in signatures.

2nd solution: force hex comparison => merge works

3rd solution: remember if the string has weird content => not possible, 
COSStrings can be changed after construction.

4th solution: use byte compare => merge works

I'll run the rendering tests and then I'll commit solution 4.

> Image has wrong colors after Merge
> ----------------------------------
>
>                 Key: PDFBOX-2401
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2401
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>         Attachments: michael levine.pdf, p13-17.pdf, p13-17double.pdf
>
>
> Marc Davis fronm the user mailing list has provided a file (michael 
> levine.pdf) that, when merged with another file, has a black image on page 17 
> ("TL-9"). I tried to investigate / narrow this somewhat:
> - it happens with any other file, or just use the michael levine file twice
> - extracting p17 with PDFSplit and then merging the result doesn't do it
> - extracting p1-17 with PDFSplit  and then merging the result does do it
> - extracting p13-17 with PDFSplit  and then merging the result does do it, 
> altthough the black is now at the first page
> The page is not really "black", the colors are incorrect.
> That's all I found out until now. I compared the two files with PDFDebugger 
> and can't see any obvious differences. I looked into the files with 
> NOTEPAD++, there are some differences like that the colorspace is now 
> indirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to