[ 
https://issues.apache.org/jira/browse/COLLECTIONS-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno P. Kinoshita updated COLLECTIONS-544:
-------------------------------------------
    Description: 
The method {{retainAll}} in {{CollectionUtils}} is inefficient when the 
parameter collection has a slow containment method.

The following is the current implementation with its documentation:

{noformat}
 /**
     * Returns a collection containing all the elements in 
<code>collection</code>
     * that are also in <code>retain</code>. The cardinality of an element 
<code>e</code>
     * in the returned collection is the same as the cardinality of 
<code>e</code>
     * in <code>collection</code> unless <code>retain</code> does not contain 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * the collection <code>c</code> and thus cannot call 
<code>c.retainAll(retain);</code>.
     *
     * @param <C>  the type of object the {@link Collection} contains
     * @param collection  the collection whose contents are the target of the 
#retailAll operation
     * @param retain  the collection containing the elements to be retained in 
the returned collection
     * @return a <code>Collection</code> containing all the elements of 
<code>collection</code>
     * that occur at least once in <code>retain</code>.
     * @throws NullPointerException if either parameter is null
     * @since 3.2
     */
    public static <C> Collection<C> retainAll(final Collection<C> collection, 
final Collection<?> retain) {
        return ListUtils.retainAll(collection, retain);
    }
{noformat}

We can notice the inefficiency by looking at the {{retainAll}} method in 
{{ListUtils}}.

The {{retainAll}} method from {{ListUtils}} is implemented and documented as 
follows:

{noformat}
  /**
     * Returns a List containing all the elements in <code>collection</code>
     * that are also in <code>retain</code>. The cardinality of an element 
<code>e</code>
     * in the returned list is the same as the cardinality of <code>e</code>
     * in <code>collection</code> unless <code>retain</code> does not contain 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * the collection <code>c</code> and thus cannot call 
<code>collection.retainAll(retain);</code>.
     * <p>
     * This implementation iterates over <code>collection</code>, checking each 
element in
     * turn to see if it's contained in <code>retain</code>. If it's contained, 
it's added
     * to the returned list. As a consequence, it is advised to use a 
collection type for
     * <code>retain</code> that provides a fast (e.g. O(1)) implementation of
     * {@link Collection#contains(Object)}.
     *
     * @param <E>  the element type
     * @param collection  the collection whose contents are the target of the 
#retailAll operation
     * @param retain  the collection containing the elements to be retained in 
the returned collection
     * @return a <code>List</code> containing all the elements of <code>c</code>
     * that occur at least once in <code>retain</code>.
     * @throws NullPointerException if either parameter is null
     * @since 3.2
     */
    public static <E> List<E> retainAll(final Collection<E> collection, final 
Collection<?> retain) {
        final List<E> list = new ArrayList<E>(Math.min(collection.size(), 
retain.size()));

        for (final E obj : collection) {
            if (retain.contains(obj)) {
                list.add(obj);
            }
        }
        return list;
    }
{noformat}

In the case of {{ListUtils#retainAll}}, the inefficiency is properly documented.

Perhaps the disclaimer about potential inefficiencies depending on the type 
of the parameter collection in ListUtils:retainAll should also be included in 
{{CollectionUtils#retainAll}}.

  was:
The method retainAll in CollectionUtils is inefficient when the parameter 
collection has a slow containment method.

The following is the current implementation with its documentation:
============================
 /**
     * Returns a collection containing all the elements in 
<code>collection</code>
     * that are also in <code>retain</code>. The cardinality of an element 
<code>e</code>
     * in the returned collection is the same as the cardinality of 
<code>e</code>
     * in <code>collection</code> unless <code>retain</code> does not contain 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * the collection <code>c</code> and thus cannot call 
<code>c.retainAll(retain);</code>.
     *
     * @param <C>  the type of object the {@link Collection} contains
     * @param collection  the collection whose contents are the target of the 
#retailAll operation
     * @param retain  the collection containing the elements to be retained in 
the returned collection
     * @return a <code>Collection</code> containing all the elements of 
<code>collection</code>
     * that occur at least once in <code>retain</code>.
     * @throws NullPointerException if either parameter is null
     * @since 3.2
     */
    public static <C> Collection<C> retainAll(final Collection<C> collection, 
final Collection<?> retain) {
        return ListUtils.retainAll(collection, retain);
    }

=======================================

We can notice the inefficiency by looking at the retainAll method in ListUtils.
The retainAll method from ListUtils is implemented and documented as follows:

=======================================

  /**
     * Returns a List containing all the elements in <code>collection</code>
     * that are also in <code>retain</code>. The cardinality of an element 
<code>e</code>
     * in the returned list is the same as the cardinality of <code>e</code>
     * in <code>collection</code> unless <code>retain</code> does not contain 
<code>e</code>, in which
     * case the cardinality is zero. This method is useful if you do not wish 
to modify
     * the collection <code>c</code> and thus cannot call 
<code>collection.retainAll(retain);</code>.
     * <p>
     * This implementation iterates over <code>collection</code>, checking each 
element in
     * turn to see if it's contained in <code>retain</code>. If it's contained, 
it's added
     * to the returned list. As a consequence, it is advised to use a 
collection type for
     * <code>retain</code> that provides a fast (e.g. O(1)) implementation of
     * {@link Collection#contains(Object)}.
     *
     * @param <E>  the element type
     * @param collection  the collection whose contents are the target of the 
#retailAll operation
     * @param retain  the collection containing the elements to be retained in 
the returned collection
     * @return a <code>List</code> containing all the elements of <code>c</code>
     * that occur at least once in <code>retain</code>.
     * @throws NullPointerException if either parameter is null
     * @since 3.2
     */
    public static <E> List<E> retainAll(final Collection<E> collection, final 
Collection<?> retain) {
        final List<E> list = new ArrayList<E>(Math.min(collection.size(), 
retain.size()));

        for (final E obj : collection) {
            if (retain.contains(obj)) {
                list.add(obj);
            }
        }
        return list;
    }
{noformat}

In the case of ListUtils:retainAll, the inefficiency is properly documented.

Perhaps the disclaimer about potential inefficiencies depending on the type 
of the parameter collection in ListUtils:retainAll should also be included in 
CollectionUtils:retainAll.


> Undocumented performance issue in the retainAll method in CollectionUtils 
> --------------------------------------------------------------------------
>
>                 Key: COLLECTIONS-544
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-544
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.1
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Trivial
>              Labels: Collections, documentaion, performance
>
> The method {{retainAll}} in {{CollectionUtils}} is inefficient when the 
> parameter collection has a slow containment method.
> The following is the current implementation with its documentation:
> {noformat}
>  /**
>      * Returns a collection containing all the elements in 
> <code>collection</code>
>      * that are also in <code>retain</code>. The cardinality of an element 
> <code>e</code>
>      * in the returned collection is the same as the cardinality of 
> <code>e</code>
>      * in <code>collection</code> unless <code>retain</code> does not contain 
> <code>e</code>, in which
>      * case the cardinality is zero. This method is useful if you do not wish 
> to modify
>      * the collection <code>c</code> and thus cannot call 
> <code>c.retainAll(retain);</code>.
>      *
>      * @param <C>  the type of object the {@link Collection} contains
>      * @param collection  the collection whose contents are the target of the 
> #retailAll operation
>      * @param retain  the collection containing the elements to be retained 
> in the returned collection
>      * @return a <code>Collection</code> containing all the elements of 
> <code>collection</code>
>      * that occur at least once in <code>retain</code>.
>      * @throws NullPointerException if either parameter is null
>      * @since 3.2
>      */
>     public static <C> Collection<C> retainAll(final Collection<C> collection, 
> final Collection<?> retain) {
>         return ListUtils.retainAll(collection, retain);
>     }
> {noformat}
> We can notice the inefficiency by looking at the {{retainAll}} method in 
> {{ListUtils}}.
> The {{retainAll}} method from {{ListUtils}} is implemented and documented as 
> follows:
> {noformat}
>   /**
>      * Returns a List containing all the elements in <code>collection</code>
>      * that are also in <code>retain</code>. The cardinality of an element 
> <code>e</code>
>      * in the returned list is the same as the cardinality of <code>e</code>
>      * in <code>collection</code> unless <code>retain</code> does not contain 
> <code>e</code>, in which
>      * case the cardinality is zero. This method is useful if you do not wish 
> to modify
>      * the collection <code>c</code> and thus cannot call 
> <code>collection.retainAll(retain);</code>.
>      * <p>
>      * This implementation iterates over <code>collection</code>, checking 
> each element in
>      * turn to see if it's contained in <code>retain</code>. If it's 
> contained, it's added
>      * to the returned list. As a consequence, it is advised to use a 
> collection type for
>      * <code>retain</code> that provides a fast (e.g. O(1)) implementation of
>      * {@link Collection#contains(Object)}.
>      *
>      * @param <E>  the element type
>      * @param collection  the collection whose contents are the target of the 
> #retailAll operation
>      * @param retain  the collection containing the elements to be retained 
> in the returned collection
>      * @return a <code>List</code> containing all the elements of 
> <code>c</code>
>      * that occur at least once in <code>retain</code>.
>      * @throws NullPointerException if either parameter is null
>      * @since 3.2
>      */
>     public static <E> List<E> retainAll(final Collection<E> collection, final 
> Collection<?> retain) {
>         final List<E> list = new ArrayList<E>(Math.min(collection.size(), 
> retain.size()));
>         for (final E obj : collection) {
>             if (retain.contains(obj)) {
>                 list.add(obj);
>             }
>         }
>         return list;
>     }
> {noformat}
> In the case of {{ListUtils#retainAll}}, the inefficiency is properly 
> documented.
> Perhaps the disclaimer about potential inefficiencies depending on the type 
> of the parameter collection in ListUtils:retainAll should also be included in 
> {{CollectionUtils#retainAll}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to