> I think the original idea was that the "general contract" was that 
> collections contained elements whose membership was defined by equals(). 
> However, for a very long time now there have been collections in the JDK such 
> as SortedSet and IdentityHashMap (more precisely, IdentityHashMap's keyset). 
> I think we have to accept the fact that these are collections even though 
> their membership semantics differ from those provided by equals().

Perhaps we must accept their placement in the type hierarchy for compatibility 
reasons, but it seems reasonable to me to declare these exceptional classes to 
be nonstandard, which means that their behavior when used as a Collection might 
be unexpected.

An alternative, perhaps, would be to add methods that describe the nature of 
the membership, e.g. whether based on equals/hash or a comparator (or something 
else?), which would then become part of the general contract. You’ll probably 
call this a non-starter :-)

> After these are fixed, the specification and implementation will at least be 
> self-consistent. However, it will still be the case that HashSet, TreeSet, 
> and IdentityHashMap's keyset will have different membership semantics, and if 
> you pass them around interchangeably you might get unexpected results. I 
> think we just have to live with that.


I’m not sure what you mean by interchangeably. A TreeSet could misbehave when 
passed to a TreeSet if the method assumes the “general contract” is satisfied 
by its parameter. Or are you adding a special-case requirement that every 
method on a Collection that takes a Collection parameter must behave properly 
when the actual argument is an instance of the same class, or something like 
that?

> Once we've admitted to the fact that different collections can have different 
> membership semantics, the main point of this bug (JDK-6982173) is to fix up 
> the semantics of AbstractSet.removeAll() to remove the semantic shifting that 
> can occur based on the relative sizes of the collections.

If we declare that collections can have different membership semantics and 
still satisfy the general contract of Collection, then we need to come up with 
a new specification of the general contract.

However, if we specify that membership based on anything other than equals/hash 
is nonstandard, then this “bug” is not a bug. It is a valid implementation on 
the assumption that the parameter satisfies the general contract of the 
parameter type. The behavior that you call a semantic shift, I would call an 
unexpected behavior resulting from using a nonstandard collection where a 
Collection was expected. I also believe that the performace penalty of fixing 
this bug is “astonishing”. One could argue whether the performance penalty is 
more or less astonishing than the unexpected behavior when a non-standard 
collection is used as the parameter.

> Introducing a new interface type is basically a non-starter.


Hopefully, what you mean is that introducting a new interface type and 
incompatibly rearranging the type hierarchy is a non-starter.

I still think it would be useful to:

1. introduce a Membership interface

2. have Collection extend Membership

3. add a method to Collection: removeAllMembers(Membership)

> Objects of type Collection get passed around a lot, and users can usefully 
> rely on things like size(), they can iterate the Collection, etc.

Except when they are implementing something like removeAll()? They can iterate 
but not assume that they are getting all the objects that are members of the 
collection, but instead they are getting representative objects?

  Alan


> On Feb 25, 2019, at 11:06 AM, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> 
> 
> On 2/15/19 11:30 AM, Alan Snyder wrote:
>> I think this situation is a mess.
>> The “general contract” of a Collection, I believe, is that it contains zero
>> or more identified member objects, as defined by appropriate equals() method.
>> I know this is hard to specify precisely, but I presume we all “know it when
>> we see it”.
>> There is value to “collections” whose members are not objects but are
>> equivalence classes of objects, as defined by a nonstandard equality test,
>> but I think it is a mistake to call them Collections.
>> If a method takes a parameter of type Collection, it should be free to assume
>> that the parameter object supports the “general contract” of Collection. Is
>> there any plausible alternative?
> 
> [back from vacation]
> 
> Yes, this situation is something of a mess.
> 
> I think the original idea was that the "general contract" was that 
> collections contained elements whose membership was defined by equals(). 
> However, for a very long time now there have been collections in the JDK such 
> as SortedSet and IdentityHashMap (more precisely, IdentityHashMap's keyset). 
> I think we have to accept the fact that these are collections even though 
> their membership semantics differ from those provided by equals().
> 
>> Changing one method in the JDK to support a non-standard Collection parameter
>> does not solve the problem, because non-JDK methods with Collection (etc.)
>> parameters could have similar “misbehavior”. How would the developer know
>> when a specific TreeSet can or cannot be passed to a method? Does every
>> method that accepts a Collection (etc.) parameter require boilerplate (as in
>> the disjoint example) explaining its exact requirements or how it can go
>> wrong?
>> Perhaps it would be useful to define specific behaviors that these
>> nonstandard “collections” might support. For example, a Membership interface
>> (with method contains(e)) would be perfect for a removeAll(Membership) method
>> on Collection, implemented as you propose.
> The wording on Collections.disjoint() is particularly clumsy. I'm not 
> entirely sure what to do about it and its implementation; suggestions 
> welcome. But I also don't think Collections.disjoint() is a very important 
> problem, compared to what's going on in the rest of the framework.
> 
> Introducing a new interface type is basically a non-starter. Objects of type 
> Collection get passed around a lot, and users can usefully rely on things 
> like size(), they can iterate the Collection, etc.
> 
> Once we've admitted to the fact that different collections can have different 
> membership semantics, the main point of this bug (JDK-6982173) is to fix up 
> the semantics of AbstractSet.removeAll() to remove the semantic shifting that 
> can occur based on the relative sizes of the collections.
> 
> There's another bug JDK-8190545 that needs to be fixed at some point, which 
> is a larger scale cleanup to clarify and remove assumptions about collection 
> membership. It starts out from the point that SortedSet and its ilk define 
> membership using a Comparator; yet TreeSet.contains() still talks about using 
> equals(), which is simply incorrect. Pulling on that thread for a while 
> reveals a bunch of places in the spec that need adjustment. See the comments 
> in the bug report and the linked email discussions.
> 
> After these are fixed, the specification and implementation will at least be 
> self-consistent. However, it will still be the case that HashSet, TreeSet, 
> and IdentityHashMap's keyset will have different membership semantics, and if 
> you pass them around interchangeably you might get unexpected results. I 
> think we just have to live with that.
> 
> s'marks
> 

Reply via email to