[
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190965#comment-17190965
]
Christian Appl edited comment on PDFBOX-4723 at 9/5/20, 2:31 AM:
-----------------------------------------------------------------
*The violated contract is:*
_"{color:#8c8c8c}Whenever it is invoked on the same object more than once
during an execution of a Java application, the \{@code hashCode} method must
consistently return the same integer, provided no information
used in \{@code equals} comparisons on the object is modified.{color}"_
*The reason for the contract violation is:*
org.apache.pdfbox.cos.COSStream method "equals" is checking for o1==o2 while
method "hashCode" is constructing a state based hash value:
{color:#000000}Object{color}[] {color:#000000}members {color}= {items,
randomAccess, {color:#871094}scratchFile{color}, {color:#871094}isWriting};
return
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
"equals" will return true no matter at what point in time it is called
(assuming: o1 == o2).
The "hashCode" result however will depend on the internal state of the object.
=> If at two different points in time "hashCode" is called, it might return
two different values.
"equals" will still return true, but the hash value will differ.
*In short the contract violation is:*
No information used in "equals" comparisons has been modified and "hashCode"
does return different integers.
*The possible solutions are:*
1. Either: Equals must include "items", "randomAccess", "scratchFile" and
"isWriting" in it's equals evaluation.
2. Or: hashCode must not use said fields to calculate the hash value.
3. Or: hashCode must return a constant value, to match o1 == o2 of the equals
implementation.
*Concerning the COSWriter/HashTable example:*
As stated: The result of hashcode can differ at different points in time, not
that of equals. (I did not claim that the result of "equals" changed at all)
Collections such as Hashtable are using the results of "equals", +aswell as+
the results of "hashCode" to check for matches:
{color:#0033b3}if {color}((e.{color:#871094}hash {color}==
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
{color:#0033b3}return
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
}
Therefore such Collections are affected, when equals/hashCode are altered.
_org.apache.pdfbox.pdfwriter.COSWriter_ is using HashTable aswell as HashMap,
both classes are affected by those changes in the described manner.
You can find screenshots of a debugstate above, where
org.apache.pdfbox.pdfwriter.COSWriter claimed, that a
_org.apache.pdfbox.cos.COSStream_ instance was not contained in a HashTable,
while it can clearly be found in said HashTable. In said example the "equals"
method returns true, while "e.hash == hash" returns false, because the field
"randomAccess" of _org.apache.pdfbox.cos.COSStream_ had changed in the meantime.
To reiterate: +*PDFBox* is using said Collections for different purposes.+
*Concerns:*
1. As initially claimed java.utils classes are affected by the changes to
"equals" and "hashCode".
2. java.utils classes affected by these changes are used by central PDFBox
classes.
3. Applications using this library are affected for the same reasons.
*Claim:*
If even the library itself is attempting to use such objects in Maps, Tables
and Lists, then such objects should behave absolutely reliable, when used in
such Collections.
I quote your statement to underline why it must be doubted, that such objects
_will_ behave reliable in such Collections:
*Quote:*
"The behavior of a map is not specified +if the value of an object is changed
in a manner that affects {{equals}} comparisons+ while the object is a key in
the map."
To reiterate: The HashTable is not only using "equals" to identify keys, but
"hashCode" aswell.
*Clarification:*
_Great care must be exercised if mutable objects are used as map keys._
=> You should be really cautious when doing this, an immutable object would be
better, but it is not forbidden and is possible, but:
_The behavior of a map is not specified if the value of an object is changed
in a manner that affects {{equals}} comparisons while the object is a key in
the map._
=> Don't do this at all, if the key object's state is affecting the results of
equals comparisons and it's results for equals (and hashCode) are inconsistent
and unreliable. Funny things will happen to your map, if you would be using
such objects.
And indeed - funny things happened to ArrayLists and HashTables, when
collecting COSBase objects such as COSStreams. Overriding "equals" and
"hashCode" is making _such objects_ out of i.e. COSStream.
*Questions:*
Why is that a good thing? Why should I be glad, that I can/should not use
COSBase objects in Collections now, when it was fine to do that before?
It is concerning to find usages of such Collections in PDFBox and then to read:
_"Thus, already by using mutable objects as map keys you do something you are
warned about. In particular changes of {{equals}} and {{hashCode}} results of
mutable classes are not considered a problem, merely the use of instances of
such classes as map keys is considered a problem."_
was (Author: capsvd):
*The violated contract is:*
_"{color:#8c8c8c}Whenever it is invoked on the same object more than once
during an execution of a Java application, the \{@code hashCode} method must
consistently return the same integer, provided no information
used in \{@code equals} comparisons on the object is modified.{color}"_
*The reason for the contract violation is:*
org.apache.pdfbox.cos.COSStream method "equals" is checking for o1==o2 while
method "hashCode" is constructing a state based hash value:
{color:#000000}Object{color}[] {color:#000000}members {color}= {items,
randomAccess, {color:#871094}scratchFile{color}, {color:#871094}isWriting};
return
{color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
"equals" will return true no matter at what point in time it is called
(assuming: o1 == o2).
The "hashCode" result however will depend on the internal state of the object.
=> If at two different points in time "hashCode" is called, it might return two
different values.
"equals" will still return true, but the hash value will differ.
*In short the contract violation is:*
No information used in "equals" comparisons has been modified and "hashCode"
does return different integers.
*The possible solutions are:*
1. Equals must include "items", "randomAccess", "scratchFile" and "isWriting"
in it's equals evaluation.
2. hashCode must not use said fields to calculate the hash value.
3. hashCode must return a constant value, to match o1 == o2 of the equals
implementation.
*Concerning the COSWriter/HashTable example:*
As stated: The result of hashcode can differ at different points in time, not
that of equals. (I did not claim that the result of "equals" changed at all)
Collections such as Hashtable are using the results of "equals", +aswell as+
the results of "hashCode" to check for matches:
{color:#0033b3}if {color}((e.{color:#871094}hash {color}==
{color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
{color:#0033b3}return
{color}({color:#007e8a}V{color})e.{color:#871094}value{color};
}
Therefore such Collections are affected, when equals/hashCode are altered.
_org.apache.pdfbox.pdfwriter.COSWriter_ is using HashTable aswell as HashMap,
both classes are affected by those changes in the described manner.
You can find screenshots of a debugstate above, where
org.apache.pdfbox.pdfwriter.COSWriter claimed, that a
_org.apache.pdfbox.cos.COSStream_ instance was not contained in a HashTable,
while it can clearly be found in said HashTable. In said example the "equals"
method returns true, while "e.hash == hash" returns false, because the field
"randomAccess" of _org.apache.pdfbox.cos.COSStream_ had changed in the meantime.
To reiterate: +*PDFBox* is using said Collections for different purposes.+
*Concerns:*
1. As initially claimed java.utils classes are affected by the changes to
"equals" and "hashCode".
2. java.utils classes affected by these changes are used by central PDFBox
classes.
3. Applications using this library are affected for the same reasons.
*Claim:*
If even the library itself is attempting to use such objects in Maps, Tables
and Lists, then such objects should behave absolutely reliable, when used in
such Collections.
I quote your statement to underline why it must be doubted, that such objects
_will_ behave reliable in such Collections:
*Quote:*
"The behavior of a map is not specified +if the value of an object is changed
in a manner that affects {{equals}} comparisons+ while the object is a key in
the map."
To reiterate: The HashTable is not only using "equals" to identify keys, but
"hashCode" aswell.
*Clarification:*
_Great care must be exercised if mutable objects are used as map keys._
=> You should be really cautious when doing this, an immutable object would be
better, but it is not forbidden and is possible, but:
_The behavior of a map is not specified if the value of an object is changed in
a manner that affects {{equals}} comparisons while the object is a key in the
map._
=> Don't do this at all, if the key object's state is affecting the results of
equals comparisons and it's results for equals (and hashCode) are inconsistent
and unreliable. Funny things will happen to your map, if you would be using
such objects.
And indeed - funny things happened to ArrayLists and HashTables, when
collecting COSBase objects such as COSStreams. Overriding "equals" and
"hashCode" is making _such objects_ out of i.e. COSStream.
*Questions:*
Why is that a good thing? Why should I be glad, that I can/should not use
COSBase objects in Collections now, when it was fine to do that before?
It is concerning to find usages of such Collections in PDFBox and then to read:
_"Thus, already by using mutable objects as map keys you do something you are
warned about. In particular changes of {{equals}} and {{hashCode}} results of
mutable classes are not considered a problem, merely the use of instances of
such classes as map keys is considered a problem."_
> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
> Issue Type: Sub-task
> Components: PDModel
> Affects Versions: 2.0.18
> Reporter: Maruan Sahyoun
> Assignee: Maruan Sahyoun
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: bird_burst.heic.pdf, image-2020-09-02-13-52-21-370.png,
> image-2020-09-02-13-53-02-622.png, image-2020-09-02-13-54-31-630.png,
> screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to
> detect if entries are in fact duplicates of others. This currently fails as
> even though one might add the same instance of an annotation object multiple
> times to setAnnotations getting the annotations will have individual
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality
> where an object is equal if the underlying COSDictionary has the same entries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]