Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Peter Levart Thu, 16 May 2019 23:49:57 -0700

Hi Alan,

I can sympathize with the performance loss aspect of this patch, but Inevertheless think it is a move in the right direction. You must admitthat the performance optimization of AbstractSet.removeAll was a rathershaky one (dependent on the relative sizes of the two collections) andas such very prone to sudden performance spikes that could occur forseemingly unknown reason. More importantly, the fact that the semanticsof the method could change drastically dependent on such thing asrelative sizes of the two collections was inexplicable. You may arguethat the semantics of the method is undefined if called upon or passedan ersatz Collection instance, but even the undefined semantics may beconsistent - i.e. independent of such things as relative sizes of bothcollections.

It would be a perfect world if this optimization was not there to beginwith. People would eventually find bottlenecks using this method andchange their code to use explicit iteration or such. There may beperfectly valid code out there that uses conformant collectionimplementations that don't cause AbstractSet.removeAll to behaveinconsistently and such code after this patch may perform badly. But itwill perform consistently for cases that use ersatz Collections and thatis more important in my opinion.

As Stuart says, optimizations mustn't change semantics. So is there apossibly narrower optimization, that doesn't change the semantics?


Here's a sketch of one:

- create a marker interface (could be JDK-internal) and mark allconforming Set implementations with it- use AbstractSet.removeAll optimization only when both collectionsimplement the marker interface



Regards, Peter

On 5/17/19 7:06 AM, Alan Snyder wrote:

Hi Stuart,

I believe I already accepted the fact of ersatz Collections.

Could you explain the inconsistency in the specification that affects 
removeAll()? I don’t see it.

In the current specification, the classes that can create ersatz Collections 
all call out the ersatz Collections in their documentation.

SortedSet:
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be <i>consistent with equals</i> if
the sorted set is to correctly implement the {@code Set} interface.
The behavior of a sorted set <i>is</i> well-defined even if its
ordering is inconsistent with equals; it just fails to obey the general
contract of the {@code Set} interface.

TreeSet is similar

IdentityHashMap:

This class is <i>not</i> a general-purpose {@code Map} implementation!  While 
this class implements the {@code Map} interface, it
intentionally violates {@code Map's} general contract, which mandates the
use of the {@code equals} method when comparing objects.

IdentityHashMap.keySet:

While the object returned by this method implements the
{@code Set} interface, it does <i>not</i> obey {@code Set's} general
contract.  Like its backing map, the set returned by this method
defines element equality as reference-equality rather than
object-equality.  This affects the behavior of its {@code contains},
{@code remove}, {@code containsAll}, {@code equals}, and
{@code hashCode} methods.

The logical implication of not supporting the general contract is that 
Collection operations performed on an ersatz Collection may have
unexpected (implementation specific) effects, which implies that passing one of 
these ersatz Collections to a parameter of a Collection
type may have unexpected (implementation specific) effects.

In other words, whether removeAll is performed on an ersatz Collection or the 
parameter is an ersatz Collection, the effects may be
implementation specific and inconsistent with the specification of removeAll.

That is not to say that improving the design would not have benefit, of course 
it does. The bug reports you cite prove that.
However, the compatibility impact of the change needs to be taken into account.

   Alan

On May 16, 2019, at 8:42 PM, Stuart Marks <[email protected]> wrote:

Hi Alan,

Whether you call them "ersatz" collections, "non-standard" collections, or 
collections with different membership semantics, they have been around in the collections framework 
from day one. This is perhaps unfortunate, and it may be considered to be bad design, but it is a 
fact.

The issue is not with undefined or implementation-specific behavior. The issue 
is that the specification as it stands is self-contradictory. In particular, it 
assumes that certain operations are symmetric that in fact are not, if you read 
other parts of the spec and derive logical conclusions from them. This is what 
I'm trying to fix. This is not a mere a matter of intellectual consistency. 
This state of affairs has real consequences. There is a whole family of bugs 
linked to this one which complain both about the performance issue and the 
semantic issue. All of these bugs are directly related to the 
self-contradictory nature of the current specification. With my changes, the 
spec is self-contradictory in fewer places. (There is, however, more work to be 
done; see JDK-8190545.)

Your lead point, though, is about performance. The performance optimization of 
AbstractSet.removeAll is one of those that assumes that the operation is 
symmetric, when in fact it isn't. As a rule, optimizations mustn't change 
semantics. Therefore, it has to go.

s'marks

Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Reply via email to