Hello! > I would argue that iterating the argument and calling remove() on "this" are > the > right semantics, because you want set membership to be determined by this set, > not by whatever collection you pass as an argument. However, I note that > AbstractCollection.removeAll and other removeAll implementations iterate over > "this" and do a contains() check on the argument. The revised > AbstractSet.removeAll would be an outlier here, though it makes sense to me to > do it this way.
For complete picture it should be noted that there's a slight difference in remove and removeAll spec: remove removes at most one element while removeAll removes all elements from the specified collection. E.g. c.removeAll(Collections.singleton(foo)) would remove all instances of foo from c while c.remove(foo) would return only one foo. These should be equivalent for Set where repeating elements should not normally appear, but it's wrong for any Collection. That's why AbstractCollection.removeAll cannot be rewritten in the same way. Another interesting thing is java.util.IdentityHashMap.KeySet#removeAll implementation [1]: it reimplements the AbstractCollection#removeAll, because of the same reason: now the first branch of AbstractSet#removeAll becomes incorrect. See the comment before it: /* * Must revert from AbstractSet's impl to AbstractCollection's, as * the former contains an optimization that results in incorrect * behavior when c is a smaller "normal" (non-identity-based) Set. */ Btw this comment should be updated to remove a "smaller" condition if the proposed change for AbstractSet will be implemented. [1] http://hg.openjdk.java.net/jdk/jdk/file/e57bcfd7bf79/src/java.base/share/classes/java/util/IdentityHashMap.java#l990 With best regards, Tagir Valeev On Fri, Jan 25, 2019 at 7:11 AM Stuart Marks <stuart.ma...@oracle.com> wrote: > > On 1/23/19 12:05 PM, Jan Peter Stotz wrote: > > like many other I ran into bug JDK-698217 which is about > > AbstractSet.removeAll() > > and it's "aptitude" in wasting CPU time if you accidentally hit this bug. > > And > > there are hundred of developers hitting this bug every year. > > > > I simply don't understand why there was no progress in 8 years, on a severe > > performance issue (a removeAll method on an efficient set that can require > > O(n^2)!) caused by code that was written to speed-up the removeAll > > implementation. > > > > Which makes this bug worse is that it is triggered by the relative size of > > the > > current set compared to the collection passed as parameter. > > Therefore for most developers this means not to use this buggy function at > > all > > (once they realized how worse it is). > > I wasn't aware that hundreds of developers are hitting this bug every year. I > haven't seen any mention of it (besides in the bug database) on Twitter, on > Reddit, on DZone, at the conferences I attend, or in several years of > core-libs-dev emails. Well, it was mentioned on core-libs-dev once in 2011 [1] > although that was a suggestion for improvement, not a complaint about > performance. > > Nonetheless this is a real problem, and if it's causing difficulties we can > certainly take a look at it. > > There is, however, more to the story. The ACTUAL problem is a semantic one; > see > JDK-6394757. [2] Briefly, consider x.removeAll(y). Depending on the relative > sizes of x and y, this method might end up using either x's or y's definition > of > membership, which could differ from each other. (See the bug report for an > example.) Thus the semantics of this method depend upon the relative sizes of > the collections, which is arguably flawed. > > Worse, this behavior is specified to iterate this set or the argument, > depending > upon their relative sizes. [3] So, fixing this will require an incompatible > specification change. > > The obvious way to fix this is to get rid of the "optimizations" (that turn > out > not to be optimizations at all in some cases) and replace it with a simple > loop: > > public boolean removeAll(Collection<?> c) { > Objects.requireNonNull(c); > boolean modified = false; > for (Object e : c) > modified |= remove(e); > return modified; > } > > I would argue that iterating the argument and calling remove() on "this" are > the > right semantics, because you want set membership to be determined by this set, > not by whatever collection you pass as an argument. However, I note that > AbstractCollection.removeAll and other removeAll implementations iterate over > "this" and do a contains() check on the argument. The revised > AbstractSet.removeAll would be an outlier here, though it makes sense to me to > do it this way. > > Is it worth the incompatibility? > > s'marks > > > > > [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html > > [2] https://bugs.openjdk.java.net/browse/JDK-6394757 > > [3] > https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection) >