Hi Jan,

I'm amazed to see this performance problem has persisted for so long. As a
Java developer, I wasn't even aware of it, so thanks for the information.

>
> The relevant part of the current implementation is as follows:
>
>          if (size() > c.size()) {
>              for (Object e : c)
>                  modified |= remove(e);
>          } else {
>              for (Iterator<?> i = iterator(); i.hasNext(); ) {
>                  if (c.contains(i.next())) {
>                      i.remove();
>                      modified = true;
>                  }
>              }
>          }
>
> The problem is that the code in the first if block only makes sense if
> the collections passed as parameter implements the remove(Object o)
> method in an efficient way - this means with a complexity of less than
> O(n).

Now, how have you come to the conclusion that the problem is in the first
branch of the above if statement, and that it is related to the passed
collection's implementation details of the remove(Object o) method? I think
this is not the problem, as elements are removed from this set and not from
the passed collection.

In fact, the problem is in the else branch (when this set is smaller than
the passed collection). The problem is in the following line:

                  if (c.contains(i.next())) {

That c.contains(...) might not be implemented in an optimized way, i.e. c
might be an ArrayList, which uses a linear search to check if the element
belongs to it.

> Java currently has no way to identify Collection implementations that
> satisfy this property (...)

Yes, you are correct. While there exists the RandomAccess marker interface
for List implementations, there is no analogous marker interface for
neither Set nor Map implementations that use hashing, i.e. something that
could have been named "HashAccess".

So I wonder and if the members of this group find it plausible to add such
HashAccess marker interface to HashSet and HashMap (and to any other
implementations that use hashing). If (as I suspect) introducing further
marker interfaces is not desirable (or has been tacitly forbidden), then
what alternatives are available and would be a good fit here?

For the record, if the HashAccess marker interface had ever existed, the
implementation could have been changed to:

          if (size() > c.size()) {
              for (Object e : c)
                  modified |= remove(e);
          } else {
              Collection<?> cc = c instanceof HashAccess ? c : new
HashSet<>(c);
              for (Iterator<?> i = iterator(); i.hasNext(); ) {
                  if (cc.contains(i.next())) {
                      i.remove();
                      modified = true;
                  }
              }
          }

This way, the contains invocation in `if (cc.contains(i.next())) {` would
have been optimized.

Thanks,
fps.-

Reply via email to