Oswaldo Olivo created COLLECTIONS-545:
-----------------------------------------

             Summary: Undocumented performance issue in the removeAll method in 
CollectionUtils
                 Key: COLLECTIONS-545
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-545
             Project: Commons Collections
          Issue Type: Bug
          Components: Collection
    Affects Versions: 4.1
         Environment: Ubuntu 14.04
            Reporter: Oswaldo Olivo
            Priority: Trivial


This bug is analogous to https://issues.apache.org/jira/browse/COLLECTIONS-544

The method removeAll in CollectionUtils is inefficient when the second 
parameter collection has a slow containment method.

The following is the current implementation with its documentation:
============================
     /**
     * Removes the elements in <code>remove</code> from 
<code>collection</code>. That is, this
     * method returns a collection containing all the elements in <code>c</code>
     * that are not in <code>remove</code>. The cardinality of an element 
<code>e</code>
     * in the returned collection is the same as the cardinality of 
<code>e</code>
     * in <code>collection</code> unless <code>remove</code> contains 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * the collection <code>c</code> and thus cannot call 
<code>collection.removeAll(remove);</code>.
     *
     * @param <E>  the type of object the {@link Collection} contains
     * @param collection  the collection from which items are removed (in the 
returned collection)
     * @param remove  the items to be removed from the returned 
<code>collection</code>
     * @return a <code>Collection</code> containing all the elements of 
<code>collection</code> except
     * any elements that also occur in <code>remove</code>.
     * @throws NullPointerException if either parameter is null
     * @since 4.0 (method existed in 3.2 but was completely broken)
     */
    public static <E> Collection<E> removeAll(final Collection<E> collection, 
final Collection<?> remove) {
        return ListUtils.removeAll(collection, remove);
    }


=======================================

We can notice the inefficiency by looking at the removeAll method in ListUtils.
The removeAll method from ListUtils is implemented and documented as follows:

=======================================

     /**
     * Removes the elements in <code>remove</code> from 
<code>collection</code>. That is, this
     * method returns a list containing all the elements in 
<code>collection</code>
     * that are not in <code>remove</code>. The cardinality of an element 
<code>e</code>
     * in the returned collection is the same as the cardinality of 
<code>e</code>
     * in <code>collection</code> unless <code>remove</code> contains 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * <code>collection</code> and thus cannot call 
<code>collection.removeAll(remove);</code>.
     * <p>
     * This implementation iterates over <code>collection</code>, checking each 
element in
     * turn to see if it's contained in <code>remove</code>. If it's not 
contained, it's added
     * to the returned list. As a consequence, it is advised to use a 
collection type for
     * <code>remove</code> that provides a fast (e.g. O(1)) implementation of
     * {@link Collection#contains(Object)}.
     *
     * @param <E>  the element type
     * @param collection  the collection from which items are removed (in the 
returned collection)
     * @param remove  the items to be removed from the returned 
<code>collection</code>
     * @return a <code>List</code> containing all the elements of 
<code>c</code> except
     * any elements that also occur in <code>remove</code>.
     * @throws NullPointerException if either parameter is null
     * @since 3.2
     */
    public static <E> List<E> removeAll(final Collection<E> collection, final 
Collection<?> remove) {
        final List<E> list = new ArrayList<E>();
        for (final E obj : collection) {
            if (!remove.contains(obj)) {
                list.add(obj);
            }
        }
        return list;
    }

=======================================

In the case of ListUtils:removeAll, the inefficiency is properly documented.

Perhaps the disclaimer about potential inefficiencies depending on the type 
of the parameter collection in ListUtils:removeAll should also be included in 
CollectionUtils:removeAll.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to