Hi Alan,

I can sympathize with the performance loss aspect of this patch, but I nevertheless think it is a move in the right direction. You must admit that the performance optimization of AbstractSet.removeAll was a rather shaky one (dependent on the relative sizes of the two collections) and as such very prone to sudden performance spikes that could occur for seemingly unknown reason. More importantly, the fact that the semantics of the method could change drastically dependent on such thing as relative sizes of the two collections was inexplicable. You may argue that the semantics of the method is undefined if called upon or passed an ersatz Collection instance, but even the undefined semantics may be consistent  - i.e. independent of such things as relative sizes of both collections.

It would be a perfect world if this optimization was not there to begin with. People would eventually find bottlenecks using this method and change their code to use explicit iteration or such. There may be perfectly valid code out there that uses conformant collection implementations that don't cause AbstractSet.removeAll to behave inconsistently and such code after this patch may perform badly. But it will perform consistently for cases that use ersatz Collections and that is more important in my opinion.

As Stuart says, optimizations mustn't change semantics. So is there a possibly narrower optimization, that doesn't change the semantics?

Here's a sketch of one:
- create a marker interface (could be JDK-internal) and mark all conforming Set implementations with it - use AbstractSet.removeAll optimization only when both collections implement the marker interface


Regards, Peter

On 5/17/19 7:06 AM, Alan Snyder wrote:
Hi Stuart,

I believe I already accepted the fact of ersatz Collections.

Could you explain the inconsistency in the specification that affects 
removeAll()? I don’t see it.

In the current specification, the classes that can create ersatz Collections 
all call out the ersatz Collections in their documentation.

SortedSet:
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be <i>consistent with equals</i> if
the sorted set is to correctly implement the {@code Set} interface.
The behavior of a sorted set <i>is</i> well-defined even if its
ordering is inconsistent with equals; it just fails to obey the general
contract of the {@code Set} interface.

TreeSet is similar

IdentityHashMap:

This class is <i>not</i> a general-purpose {@code Map} implementation!  While 
this class implements the {@code Map} interface, it
intentionally violates {@code Map's} general contract, which mandates the
use of the {@code equals} method when comparing objects.

IdentityHashMap.keySet:

While the object returned by this method implements the
{@code Set} interface, it does <i>not</i> obey {@code Set's} general
contract.  Like its backing map, the set returned by this method
defines element equality as reference-equality rather than
object-equality.  This affects the behavior of its {@code contains},
{@code remove}, {@code containsAll}, {@code equals}, and
{@code hashCode} methods.

The logical implication of not supporting the general contract is that 
Collection operations performed on an ersatz Collection may have
unexpected (implementation specific) effects, which implies that passing one of 
these ersatz Collections to a parameter of a Collection
type may have unexpected (implementation specific) effects.

In other words, whether removeAll is performed on an ersatz Collection or the 
parameter is an ersatz Collection, the effects may be
implementation specific and inconsistent with the specification of removeAll.

That is not to say that improving the design would not have benefit, of course 
it does. The bug reports you cite prove that.
However, the compatibility impact of the change needs to be taken into account.

   Alan



On May 16, 2019, at 8:42 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:

Hi Alan,

Whether you call them "ersatz" collections, "non-standard" collections, or 
collections with different membership semantics, they have been around in the collections framework 
from day one. This is perhaps unfortunate, and it may be considered to be bad design, but it is a 
fact.

The issue is not with undefined or implementation-specific behavior. The issue 
is that the specification as it stands is self-contradictory. In particular, it 
assumes that certain operations are symmetric that in fact are not, if you read 
other parts of the spec and derive logical conclusions from them. This is what 
I'm trying to fix. This is not a mere a matter of intellectual consistency. 
This state of affairs has real consequences. There is a whole family of bugs 
linked to this one which complain both about the performance issue and the 
semantic issue. All of these bugs are directly related to the 
self-contradictory nature of the current specification. With my changes, the 
spec is self-contradictory in fewer places. (There is, however, more work to be 
done; see JDK-8190545.)

Your lead point, though, is about performance. The performance optimization of 
AbstractSet.removeAll is one of those that assumes that the operation is 
symmetric, when in fact it isn't. As a rule, optimizations mustn't change 
semantics. Therefore, it has to go.

s'marks


Reply via email to