If we take this route, how about changing the parameter type to Iterable? Alan
> On Feb 13, 2019, at 7:12 PM, Stuart Marks <stuart.ma...@oracle.com> wrote: > > Right, as I mentioned in my earlier emails [1][2] this is related to > JDK-6394757 [3] where the semantics shift depending on the relative sizes of > the collections. This has a complicated history. In JDK-6982173 [4] there is > a lot of discussion about what heuristics to use for iterating this vs the > arg, in order to get the best performance. This assumes that these operations > are equivalent, when they aren't! In JDK-6394757 there's some discussion > about the dubious semantics of this optimization and the potential of simply > removing it. > > At this point I think we need to remove the performance heurstic in order to > get consistent semantics. The performance won't be optimal -- indeed, it > wasn't before -- but at least it should be more predictable. > > Your example is indeed counterintuitive, as it uses the contains() semantics > of the argument -- the string length -- instead of the contains() semantics > of "this". However, it's defensible, and it's supported by the spec, so I > don't really see any alternative. > > The example set is pretty counterintuitive in the first place. For example: > > jshell> Set<String> set = new > TreeSet<>(Comparator.comparingInt(String::length)) > jshell> set.addAll(List.of("a", "if", "the", "when")) > jshell> set > set ==> [a, if, the, when] > jshell> set.contains("foo") > $235 ==> true > > Also, this comparator isn't consistent with equals. This isn't a bug. But it > does give rise to even more counterintuitive behavior: > > jshell> var set2 = Set.of("x", "xx", "xxx", "xxxx") > jshell> set2.equals(set) > $237 ==> false > jshell> set.equals(set2) > $238 ==> true > > Fun with collections! :-) > > s'marks > > > [1] > http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058172.html > > [2] > http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058539.html > > [3] https://bugs.openjdk.java.net/browse/JDK-6394757 > > [4] https://bugs.openjdk.java.net/browse/JDK-6982173 > > > On 2/11/19 2:23 PM, Michael Rasmussen wrote: >> The current implementation seems very counter-intuitive with anything that >> doesn't have the same comparison semantics, for instance a TreeSet/Map with >> a Comparator, Identity-based Set/Map etc. >> The output of the following snippet would probably surprise most: >> /* --- snip --- */ >> import java.util.*; >> class Test { >> public static void main(String[] args) { >> TreeMap<String, List<String>> lengthMap = new >> TreeMap<>(Comparator.comparingInt(String::length)); >> List<String> strings = new ArrayList<>(Arrays.asList("The quick brown >> fox jumps over the lazy dog".split(" "))); >> for (String s : strings) { >> lengthMap.computeIfAbsent(s, k -> new ArrayList<>()).add(s); >> } >> Set<String> toRemove = lengthMap.keySet(); >> System.out.println("List before: " + strings); >> System.out.println("Set to remove: " + toRemove); >> strings.removeAll(toRemove); >> System.out.println("List after:" + strings); >> } >> } >> /* --- snip --- */ >> List before: [The, quick, brown, fox, jumps, over, the, lazy, dog] >> Set to remove: [The, over, quick] >> List after:[] >> /Michael >> From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> on behalf of >> Tagir Valeev <amae...@gmail.com> >> Sent: 08 February 2019 15:13 >> To: Stuart Marks >> Cc: core-libs-dev; Jan Peter Stotz >> Subject: Re: JDK-6982173: Small problem causing thousands of wasted CPU hours >> Hello! >>> I would argue that iterating the argument and calling remove() on "this" >>> are the >>> right semantics, because you want set membership to be determined by this >>> set, >>> not by whatever collection you pass as an argument. However, I note that >>> AbstractCollection.removeAll and other removeAll implementations iterate >>> over >>> "this" and do a contains() check on the argument. The revised >>> AbstractSet.removeAll would be an outlier here, though it makes sense to me >>> to >>> do it this way. >> For complete picture it should be noted that there's a slight >> difference in remove and removeAll spec: remove removes at most one >> element while removeAll removes all elements from the specified >> collection. >> E.g. c.removeAll(Collections.singleton(foo)) would remove all >> instances of foo from c while c.remove(foo) would return only one foo. >> These should be equivalent for Set where repeating elements should not >> normally appear, but it's wrong for any Collection. That's why >> AbstractCollection.removeAll >> cannot be rewritten in the same way. >> Another interesting thing is >> java.util.IdentityHashMap.KeySet#removeAll implementation [1]: >> it reimplements the AbstractCollection#removeAll, because of the same >> reason: now >> the first branch of AbstractSet#removeAll becomes incorrect. See the >> comment before it: >> /* >> * Must revert from AbstractSet's impl to AbstractCollection's, as >> * the former contains an optimization that results in incorrect >> * behavior when c is a smaller "normal" (non-identity-based) Set. >> */ >> Btw this comment should be updated to remove a "smaller" condition if >> the proposed >> change for AbstractSet will be implemented. >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/e57bcfd7bf79/src/java.base/share/classes/java/util/IdentityHashMap.java#l990 >> With best regards, >> Tagir Valeev >> On Fri, Jan 25, 2019 at 7:11 AM Stuart Marks <stuart.ma...@oracle.com> wrote: >>> >>> On 1/23/19 12:05 PM, Jan Peter Stotz wrote: >>>> like many other I ran into bug JDK-698217 which is about >>>> AbstractSet.removeAll() >>>> and it's "aptitude" in wasting CPU time if you accidentally hit this bug. >>>> And >>>> there are hundred of developers hitting this bug every year. >>>> >>>> I simply don't understand why there was no progress in 8 years, on a severe >>>> performance issue (a removeAll method on an efficient set that can require >>>> O(n^2)!) caused by code that was written to speed-up the removeAll >>>> implementation. >>>> >>>> Which makes this bug worse is that it is triggered by the relative size of >>>> the >>>> current set compared to the collection passed as parameter. >>>> Therefore for most developers this means not to use this buggy function at >>>> all >>>> (once they realized how worse it is). >>> >>> I wasn't aware that hundreds of developers are hitting this bug every year. >>> I >>> haven't seen any mention of it (besides in the bug database) on Twitter, on >>> Reddit, on DZone, at the conferences I attend, or in several years of >>> core-libs-dev emails. Well, it was mentioned on core-libs-dev once in 2011 >>> [1] >>> although that was a suggestion for improvement, not a complaint about >>> performance. >>> >>> Nonetheless this is a real problem, and if it's causing difficulties we can >>> certainly take a look at it. >>> >>> There is, however, more to the story. The ACTUAL problem is a semantic one; >>> see >>> JDK-6394757. [2] Briefly, consider x.removeAll(y). Depending on the relative >>> sizes of x and y, this method might end up using either x's or y's >>> definition of >>> membership, which could differ from each other. (See the bug report for an >>> example.) Thus the semantics of this method depend upon the relative sizes >>> of >>> the collections, which is arguably flawed. >>> >>> Worse, this behavior is specified to iterate this set or the argument, >>> depending >>> upon their relative sizes. [3] So, fixing this will require an incompatible >>> specification change. >>> >>> The obvious way to fix this is to get rid of the "optimizations" (that turn >>> out >>> not to be optimizations at all in some cases) and replace it with a simple >>> loop: >>> >>> public boolean removeAll(Collection<?> c) { >>> Objects.requireNonNull(c); >>> boolean modified = false; >>> for (Object e : c) >>> modified |= remove(e); >>> return modified; >>> } >>> >>> I would argue that iterating the argument and calling remove() on "this" >>> are the >>> right semantics, because you want set membership to be determined by this >>> set, >>> not by whatever collection you pass as an argument. However, I note that >>> AbstractCollection.removeAll and other removeAll implementations iterate >>> over >>> "this" and do a contains() check on the argument. The revised >>> AbstractSet.removeAll would be an outlier here, though it makes sense to me >>> to >>> do it this way. >>> >>> Is it worth the incompatibility? >>> >>> s'marks >>> >>> >>> >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html >>> >>> [2] https://bugs.openjdk.java.net/browse/JDK-6394757 >>> >>> [3] >>> >>> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection) >>> >> >