If we take this route, how about changing the parameter type to Iterable?

  Alan



> On Feb 13, 2019, at 7:12 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> Right, as I mentioned in my earlier emails [1][2] this is related to 
> JDK-6394757 [3] where the semantics shift depending on the relative sizes of 
> the collections. This has a complicated history. In JDK-6982173 [4] there is 
> a lot of discussion about what heuristics to use for iterating this vs the 
> arg, in order to get the best performance. This assumes that these operations 
> are equivalent, when they aren't! In JDK-6394757 there's some discussion 
> about the dubious semantics of this optimization and the potential of simply 
> removing it.
> 
> At this point I think we need to remove the performance heurstic in order to 
> get consistent semantics. The performance won't be optimal -- indeed, it 
> wasn't before -- but at least it should be more predictable.
> 
> Your example is indeed counterintuitive, as it uses the contains() semantics 
> of the argument -- the string length -- instead of the contains() semantics 
> of "this". However, it's defensible, and it's supported by the spec, so I 
> don't really see any alternative.
> 
> The example set is pretty counterintuitive in the first place. For example:
> 
> jshell> Set<String> set = new 
> TreeSet<>(Comparator.comparingInt(String::length))
> jshell> set.addAll(List.of("a", "if", "the", "when"))
> jshell> set
> set ==> [a, if, the, when]
> jshell> set.contains("foo")
> $235 ==> true
> 
> Also, this comparator isn't consistent with equals. This isn't a bug. But it 
> does give rise to even more counterintuitive behavior:
> 
> jshell> var set2 = Set.of("x", "xx", "xxx", "xxxx")
> jshell> set2.equals(set)
> $237 ==> false
> jshell> set.equals(set2)
> $238 ==> true
> 
> Fun with collections! :-)
> 
> s'marks
> 
> 
> [1] 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058172.html
> 
> [2] 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058539.html
> 
> [3] https://bugs.openjdk.java.net/browse/JDK-6394757
> 
> [4] https://bugs.openjdk.java.net/browse/JDK-6982173
> 
> 
> On 2/11/19 2:23 PM, Michael Rasmussen wrote:
>> The current implementation seems very counter-intuitive with anything that 
>> doesn't have the same comparison semantics, for instance a TreeSet/Map with 
>> a Comparator, Identity-based Set/Map etc.
>> The output of the following snippet would probably surprise most:
>> /* --- snip --- */
>> import java.util.*;
>> class Test {
>>   public static void main(String[] args) {
>>     TreeMap<String, List<String>> lengthMap = new 
>> TreeMap<>(Comparator.comparingInt(String::length));
>>     List<String> strings = new ArrayList<>(Arrays.asList("The quick brown 
>> fox jumps over the lazy dog".split(" ")));
>>     for (String s : strings) {
>>       lengthMap.computeIfAbsent(s, k -> new ArrayList<>()).add(s);
>>     }
>>       Set<String> toRemove = lengthMap.keySet();
>>       System.out.println("List before: " + strings);
>>     System.out.println("Set to remove: " + toRemove);
>>     strings.removeAll(toRemove);
>>     System.out.println("List after:" + strings);
>>   }
>> }
>> /* --- snip --- */
>> List before: [The, quick, brown, fox, jumps, over, the, lazy, dog]
>> Set to remove: [The, over, quick]
>> List after:[]
>> /Michael
>> From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> on behalf of 
>> Tagir Valeev <amae...@gmail.com>
>> Sent: 08 February 2019 15:13
>> To: Stuart Marks
>> Cc: core-libs-dev; Jan Peter Stotz
>> Subject: Re: JDK-6982173: Small problem causing thousands of wasted CPU hours
>>   Hello!
>>> I would argue that iterating the argument and calling remove() on "this" 
>>> are the
>>> right semantics, because you want set membership to be determined by this 
>>> set,
>>> not by whatever collection you pass as an argument. However, I note that
>>> AbstractCollection.removeAll and other removeAll implementations iterate 
>>> over
>>> "this" and do a contains() check on the argument. The revised
>>> AbstractSet.removeAll would be an outlier here, though it makes sense to me 
>>> to
>>> do it this way.
>> For complete picture it should be noted that there's a slight
>> difference in remove and removeAll spec: remove removes at most one
>> element while removeAll removes all elements from the specified
>> collection.
>> E.g. c.removeAll(Collections.singleton(foo)) would remove all
>> instances of foo from c while c.remove(foo) would return only one foo.
>> These should be equivalent for Set where repeating elements should not
>> normally appear, but it's wrong for any Collection. That's why
>> AbstractCollection.removeAll
>> cannot be rewritten in the same way.
>> Another interesting thing is
>> java.util.IdentityHashMap.KeySet#removeAll implementation [1]:
>> it reimplements the AbstractCollection#removeAll, because of the same
>> reason: now
>> the first branch of AbstractSet#removeAll becomes incorrect. See the
>> comment before it:
>>         /*
>>          * Must revert from AbstractSet's impl to AbstractCollection's, as
>>          * the former contains an optimization that results in incorrect
>>          * behavior when c is a smaller "normal" (non-identity-based) Set.
>>          */
>> Btw this comment should be updated to remove a "smaller" condition if
>> the proposed
>> change for AbstractSet will be implemented.
>> [1]  
>> http://hg.openjdk.java.net/jdk/jdk/file/e57bcfd7bf79/src/java.base/share/classes/java/util/IdentityHashMap.java#l990
>> With best regards,
>> Tagir Valeev
>> On Fri, Jan 25, 2019 at 7:11 AM Stuart Marks <stuart.ma...@oracle.com> wrote:
>>> 
>>> On 1/23/19 12:05 PM, Jan Peter Stotz wrote:
>>>> like many other I ran into bug JDK-698217 which is about 
>>>> AbstractSet.removeAll()
>>>> and it's "aptitude" in wasting CPU time if you accidentally hit this bug. 
>>>> And
>>>> there are hundred of developers hitting this bug every year.
>>>> 
>>>> I simply don't understand why there was no progress in 8 years, on a severe
>>>> performance issue (a removeAll method on an efficient set that can require
>>>> O(n^2)!) caused by code that was written to speed-up the removeAll 
>>>> implementation.
>>>> 
>>>> Which makes this bug worse is that it is triggered by the relative size of 
>>>> the
>>>> current set compared to the collection passed as parameter.
>>>> Therefore for most developers this means not to use this buggy function at 
>>>> all
>>>> (once they realized how worse it is).
>>> 
>>> I wasn't aware that hundreds of developers are hitting this bug every year. 
>>> I
>>> haven't seen any mention of it (besides in the bug database) on Twitter, on
>>> Reddit, on DZone, at the conferences I attend, or in several years of
>>> core-libs-dev emails. Well, it was mentioned on core-libs-dev once in 2011 
>>> [1]
>>> although that was a suggestion for improvement, not a complaint about 
>>> performance.
>>> 
>>> Nonetheless this is a real problem, and if it's causing difficulties we can
>>> certainly take a look at it.
>>> 
>>> There is, however, more to the story. The ACTUAL problem is a semantic one; 
>>> see
>>> JDK-6394757. [2] Briefly, consider x.removeAll(y). Depending on the relative
>>> sizes of x and y, this method might end up using either x's or y's 
>>> definition of
>>> membership, which could differ from each other. (See the bug report for an
>>> example.) Thus the semantics of this method depend upon the relative sizes 
>>> of
>>> the collections, which is arguably flawed.
>>> 
>>> Worse, this behavior is specified to iterate this set or the argument, 
>>> depending
>>> upon their relative sizes. [3] So, fixing this will require an incompatible
>>> specification change.
>>> 
>>> The obvious way to fix this is to get rid of the "optimizations" (that turn 
>>> out
>>> not to be optimizations at all in some cases) and replace it with a simple 
>>> loop:
>>> 
>>>       public boolean removeAll(Collection<?> c) {
>>>           Objects.requireNonNull(c);
>>>           boolean modified = false;
>>>           for (Object e : c)
>>>               modified |= remove(e);
>>>           return modified;
>>>       }
>>> 
>>> I would argue that iterating the argument and calling remove() on "this" 
>>> are the
>>> right semantics, because you want set membership to be determined by this 
>>> set,
>>> not by whatever collection you pass as an argument. However, I note that
>>> AbstractCollection.removeAll and other removeAll implementations iterate 
>>> over
>>> "this" and do a contains() check on the argument. The revised
>>> AbstractSet.removeAll would be an outlier here, though it makes sense to me 
>>> to
>>> do it this way.
>>> 
>>> Is it worth the incompatibility?
>>> 
>>> s'marks
>>> 
>>> 
>>> 
>>> 
>>> [1]  
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html
>>> 
>>> [2] https://bugs.openjdk.java.net/browse/JDK-6394757
>>> 
>>> [3]
>>>  
>>> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection)
>>> 
>>     
> 

Reply via email to