[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190965#comment-17190965
 ] 

Christian Appl edited comment on PDFBOX-4723 at 9/5/20, 2:31 AM:
-----------------------------------------------------------------

*The violated contract is:*
 _"{color:#8c8c8c}Whenever it is invoked on the same object more than once 
during an execution of a Java application, the \{@code hashCode} method must 
consistently return the same integer, provided no information
 used in \{@code equals} comparisons on the object is modified.{color}"_

*The reason for the contract violation is:*
 org.apache.pdfbox.cos.COSStream method "equals" is checking for o1==o2 while 
method "hashCode" is constructing a state based hash value:
 {color:#000000}Object{color}[] {color:#000000}members {color}= {items, 
randomAccess, {color:#871094}scratchFile{color}, {color:#871094}isWriting};
 return 
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});

"equals" will return true no matter at what point in time it is called 
(assuming: o1 == o2).
 The "hashCode" result however will depend on the internal state of the object.
 => If at two different points in time "hashCode" is called, it might return 
two different values.

"equals" will still return true, but the hash value will differ.

*In short the contract violation is:*
 No information used in "equals" comparisons has been modified and "hashCode" 
does return different integers.

*The possible solutions are:*
 1. Either: Equals must include "items", "randomAccess", "scratchFile" and 
"isWriting" in it's equals evaluation.
 2. Or: hashCode must not use said fields to calculate the hash value.
 3. Or: hashCode must return a constant value, to match o1 == o2 of the equals 
implementation.

*Concerning the COSWriter/HashTable example:*
 As stated: The result of hashcode can differ at different points in time, not 
that of equals. (I did not claim that the result of "equals" changed at all)
 Collections such as Hashtable are using the results of "equals", +aswell as+ 
the results of "hashCode" to check for matches:
 {color:#0033b3}if {color}((e.{color:#871094}hash {color}== 
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
 {color:#0033b3}return 
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
 }

Therefore such Collections are affected, when equals/hashCode are altered.
 _org.apache.pdfbox.pdfwriter.COSWriter_ is using HashTable aswell as HashMap, 
both classes are affected by those changes in the described manner.

You can find screenshots of a debugstate above, where 
org.apache.pdfbox.pdfwriter.COSWriter claimed, that a 
_org.apache.pdfbox.cos.COSStream_ instance was not contained in a HashTable, 
while it can clearly be found in said HashTable. In said example the "equals" 
method returns true, while "e.hash == hash" returns false, because the field 
"randomAccess" of _org.apache.pdfbox.cos.COSStream_ had changed in the meantime.

To reiterate: +*PDFBox* is using said Collections for different purposes.+

*Concerns:*
 1. As initially claimed java.utils classes are affected by the changes to 
"equals" and "hashCode".
 2. java.utils classes affected by these changes are used by central PDFBox 
classes.
 3. Applications using this library are affected for the same reasons.

*Claim:*
 If even the library itself is attempting to use such objects in Maps, Tables 
and Lists, then such objects should behave absolutely reliable, when used in 
such Collections.
 I quote your statement to underline why it must be doubted, that such objects 
_will_ behave reliable in such Collections:

*Quote:*
 "The behavior of a map is not specified +if the value of an object is changed 
in a manner that affects {{equals}} comparisons+ while the object is a key in 
the map."
 To reiterate: The HashTable is not only using "equals" to identify keys, but 
"hashCode" aswell.

*Clarification:*
 _Great care must be exercised if mutable objects are used as map keys._
 => You should be really cautious when doing this, an immutable object would be 
better, but it is not forbidden and is possible, but:
 _The behavior of a map is not specified if the value of an object is changed 
in a manner that affects {{equals}} comparisons while the object is a key in 
the map._
 => Don't do this at all, if the key object's state is affecting the results of 
equals comparisons and it's results for equals (and hashCode) are inconsistent 
and unreliable. Funny things will happen to your map, if you would be using 
such objects.

And indeed - funny things happened to ArrayLists and HashTables, when 
collecting COSBase objects such as COSStreams. Overriding "equals" and 
"hashCode" is making _such objects_ out of i.e. COSStream.

*Questions:*
 Why is that a good thing? Why should I be glad, that I can/should not use 
COSBase objects in Collections now, when it was fine to do that before?

It is concerning to find usages of such Collections in PDFBox and then to read:
 _"Thus, already by using mutable objects as map keys you do something you are 
warned about. In particular changes of {{equals}} and {{hashCode}} results of 
mutable classes are not considered a problem, merely the use of instances of 
such classes as map keys is considered a problem."_


was (Author: capsvd):
*The violated contract is:*
_"{color:#8c8c8c}Whenever it is invoked on the same object more than once 
during an execution of a Java application, the \{@code hashCode} method must 
consistently return the same integer, provided no information
used in \{@code equals} comparisons on the object is modified.{color}"_

*The reason for the contract violation is:*
org.apache.pdfbox.cos.COSStream method "equals" is checking for o1==o2 while 
method "hashCode" is constructing a state based hash value:
{color:#000000}Object{color}[] {color:#000000}members {color}= {items, 
randomAccess, {color:#871094}scratchFile{color}, {color:#871094}isWriting};
return 
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});

"equals" will return true no matter at what point in time it is called 
(assuming: o1 == o2).
The "hashCode" result however will depend on the internal state of the object.
=> If at two different points in time "hashCode" is called, it might return two 
different values.

"equals" will still return true, but the hash value will differ.

*In short the contract violation is:*
No information used in "equals" comparisons has been modified and "hashCode" 
does return different integers.

*The possible solutions are:*
1. Equals must include "items", "randomAccess", "scratchFile" and "isWriting" 
in it's equals evaluation.
2. hashCode must not use said fields to calculate the hash value.
3. hashCode must return a constant value, to match o1 == o2 of the equals 
implementation.

*Concerning the COSWriter/HashTable example:*
As stated: The result of hashcode can differ at different points in time, not 
that of equals. (I did not claim that the result of "equals" changed at all)
Collections such as Hashtable are using the results of "equals", +aswell as+ 
the results of "hashCode" to check for matches:
{color:#0033b3}if {color}((e.{color:#871094}hash {color}== 
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
{color:#0033b3}return 
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
}

Therefore such Collections are affected, when equals/hashCode are altered.
_org.apache.pdfbox.pdfwriter.COSWriter_ is using HashTable aswell as HashMap, 
both classes are affected by those changes in the described manner.

You can find screenshots of a debugstate above, where 
org.apache.pdfbox.pdfwriter.COSWriter claimed, that a 
_org.apache.pdfbox.cos.COSStream_ instance was not contained in a HashTable, 
while it can clearly be found in said HashTable. In said example the "equals" 
method returns true, while "e.hash == hash" returns false, because the field 
"randomAccess" of _org.apache.pdfbox.cos.COSStream_ had changed in the meantime.

To reiterate: +*PDFBox* is using said Collections for different purposes.+

*Concerns:*
1. As initially claimed java.utils classes are affected by the changes to 
"equals" and "hashCode".
2. java.utils classes affected by these changes are used by central PDFBox 
classes.
3. Applications using this library are affected for the same reasons.

*Claim:*
If even the library itself is attempting to use such objects in Maps, Tables 
and Lists, then such objects should behave absolutely reliable, when used in 
such Collections.
I quote your statement to underline why it must be doubted, that such objects 
_will_ behave reliable in such Collections:

*Quote:*
"The behavior of a map is not specified +if the value of an object is changed 
in a manner that affects {{equals}} comparisons+ while the object is a key in 
the map."
To reiterate: The HashTable is not only using "equals" to identify keys, but 
"hashCode" aswell.

*Clarification:*
_Great care must be exercised if mutable objects are used as map keys._
=> You should be really cautious when doing this, an immutable object would be 
better, but it is not forbidden and is possible, but:
_The behavior of a map is not specified if the value of an object is changed in 
a manner that affects {{equals}} comparisons while the object is a key in the 
map._
=> Don't do this at all, if the key object's state is affecting the results of 
equals comparisons and it's results for equals (and hashCode) are inconsistent 
and unreliable. Funny things will happen to your map, if you would be using 
such objects.

 And indeed - funny things happened to ArrayLists and HashTables, when 
collecting COSBase objects such as COSStreams. Overriding "equals" and 
"hashCode" is making _such objects_ out of i.e. COSStream.

*Questions:*
Why is that a good thing? Why should I be glad, that I can/should not use 
COSBase objects in Collections now, when it was fine to do that before?

It is concerning to find usages of such Collections in PDFBox and then to read:
_"Thus, already by using mutable objects as map keys you do something you are 
warned about. In particular changes of {{equals}} and {{hashCode}} results of 
mutable classes are not considered a problem, merely the use of instances of 
such classes as map keys is considered a problem."_

> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4723
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4723
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: PDModel
>    Affects Versions: 2.0.18
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: bird_burst.heic.pdf, image-2020-09-02-13-52-21-370.png, 
> image-2020-09-02-13-53-02-622.png, image-2020-09-02-13-54-31-630.png, 
> screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to