[
https://issues.apache.org/jira/browse/COLLECTIONS-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruno P. Kinoshita updated COLLECTIONS-544:
-------------------------------------------
Description:
The method retainAll in CollectionUtils is inefficient when the parameter
collection has a slow containment method.
The following is the current implementation with its documentation:
============================
/**
* Returns a collection containing all the elements in
<code>collection</code>
* that are also in <code>retain</code>. The cardinality of an element
<code>e</code>
* in the returned collection is the same as the cardinality of
<code>e</code>
* in <code>collection</code> unless <code>retain</code> does not contain
<code>e</code>, in which
* case the cardinality is zero. This method is useful if you do not wish
to modify
* the collection <code>c</code> and thus cannot call
<code>c.retainAll(retain);</code>.
*
* @param <C> the type of object the {@link Collection} contains
* @param collection the collection whose contents are the target of the
#retailAll operation
* @param retain the collection containing the elements to be retained in
the returned collection
* @return a <code>Collection</code> containing all the elements of
<code>collection</code>
* that occur at least once in <code>retain</code>.
* @throws NullPointerException if either parameter is null
* @since 3.2
*/
public static <C> Collection<C> retainAll(final Collection<C> collection,
final Collection<?> retain) {
return ListUtils.retainAll(collection, retain);
}
=======================================
We can notice the inefficiency by looking at the retainAll method in ListUtils.
The retainAll method from ListUtils is implemented and documented as follows:
=======================================
/**
* Returns a List containing all the elements in <code>collection</code>
* that are also in <code>retain</code>. The cardinality of an element
<code>e</code>
* in the returned list is the same as the cardinality of <code>e</code>
* in <code>collection</code> unless <code>retain</code> does not contain
<code>e</code>, in which
* case the cardinality is zero. This method is useful if you do not wish
to modify
* the collection <code>c</code> and thus cannot call
<code>collection.retainAll(retain);</code>.
* <p>
* This implementation iterates over <code>collection</code>, checking each
element in
* turn to see if it's contained in <code>retain</code>. If it's contained,
it's added
* to the returned list. As a consequence, it is advised to use a
collection type for
* <code>retain</code> that provides a fast (e.g. O(1)) implementation of
* {@link Collection#contains(Object)}.
*
* @param <E> the element type
* @param collection the collection whose contents are the target of the
#retailAll operation
* @param retain the collection containing the elements to be retained in
the returned collection
* @return a <code>List</code> containing all the elements of <code>c</code>
* that occur at least once in <code>retain</code>.
* @throws NullPointerException if either parameter is null
* @since 3.2
*/
public static <E> List<E> retainAll(final Collection<E> collection, final
Collection<?> retain) {
final List<E> list = new ArrayList<E>(Math.min(collection.size(),
retain.size()));
for (final E obj : collection) {
if (retain.contains(obj)) {
list.add(obj);
}
}
return list;
}
{noformat}
In the case of ListUtils:retainAll, the inefficiency is properly documented.
Perhaps the disclaimer about potential inefficiencies depending on the type
of the parameter collection in ListUtils:retainAll should also be included in
CollectionUtils:retainAll.
was:
The method retainAll in CollectionUtils is inefficient when the parameter
collection has a slow containment method.
The following is the current implementation with its documentation:
============================
/**
* Returns a collection containing all the elements in
<code>collection</code>
* that are also in <code>retain</code>. The cardinality of an element
<code>e</code>
* in the returned collection is the same as the cardinality of
<code>e</code>
* in <code>collection</code> unless <code>retain</code> does not contain
<code>e</code>, in which
* case the cardinality is zero. This method is useful if you do not wish
to modify
* the collection <code>c</code> and thus cannot call
<code>c.retainAll(retain);</code>.
*
* @param <C> the type of object the {@link Collection} contains
* @param collection the collection whose contents are the target of the
#retailAll operation
* @param retain the collection containing the elements to be retained in
the returned collection
* @return a <code>Collection</code> containing all the elements of
<code>collection</code>
* that occur at least once in <code>retain</code>.
* @throws NullPointerException if either parameter is null
* @since 3.2
*/
public static <C> Collection<C> retainAll(final Collection<C> collection,
final Collection<?> retain) {
return ListUtils.retainAll(collection, retain);
}
=======================================
We can notice the inefficiency by looking at the retainAll method in ListUtils.
The retainAll method from ListUtils is implemented and documented as follows:
=======================================
/**
* Returns a List containing all the elements in <code>collection</code>
* that are also in <code>retain</code>. The cardinality of an element
<code>e</code>
* in the returned list is the same as the cardinality of <code>e</code>
* in <code>collection</code> unless <code>retain</code> does not contain
<code>e</code>, in which
* case the cardinality is zero. This method is useful if you do not wish
to modify
* the collection <code>c</code> and thus cannot call
<code>collection.retainAll(retain);</code>.
* <p>
* This implementation iterates over <code>collection</code>, checking each
element in
* turn to see if it's contained in <code>retain</code>. If it's contained,
it's added
* to the returned list. As a consequence, it is advised to use a
collection type for
* <code>retain</code> that provides a fast (e.g. O(1)) implementation of
* {@link Collection#contains(Object)}.
*
* @param <E> the element type
* @param collection the collection whose contents are the target of the
#retailAll operation
* @param retain the collection containing the elements to be retained in
the returned collection
* @return a <code>List</code> containing all the elements of <code>c</code>
* that occur at least once in <code>retain</code>.
* @throws NullPointerException if either parameter is null
* @since 3.2
*/
public static <E> List<E> retainAll(final Collection<E> collection, final
Collection<?> retain) {
final List<E> list = new ArrayList<E>(Math.min(collection.size(),
retain.size()));
for (final E obj : collection) {
if (retain.contains(obj)) {
list.add(obj);
}
}
return list;
}
=======================================
In the case of ListUtils:retainAll, the inefficiency is properly documented.
Perhaps the disclaimer about potential inefficiencies depending on the type
of the parameter collection in ListUtils:retainAll should also be included in
CollectionUtils:retainAll.
> Undocumented performance issue in the retainAll method in CollectionUtils
> --------------------------------------------------------------------------
>
> Key: COLLECTIONS-544
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-544
> Project: Commons Collections
> Issue Type: Bug
> Components: Collection
> Affects Versions: 4.1
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Trivial
> Labels: Collections, documentaion, performance
>
> The method retainAll in CollectionUtils is inefficient when the parameter
> collection has a slow containment method.
> The following is the current implementation with its documentation:
> ============================
> /**
> * Returns a collection containing all the elements in
> <code>collection</code>
> * that are also in <code>retain</code>. The cardinality of an element
> <code>e</code>
> * in the returned collection is the same as the cardinality of
> <code>e</code>
> * in <code>collection</code> unless <code>retain</code> does not contain
> <code>e</code>, in which
> * case the cardinality is zero. This method is useful if you do not wish
> to modify
> * the collection <code>c</code> and thus cannot call
> <code>c.retainAll(retain);</code>.
> *
> * @param <C> the type of object the {@link Collection} contains
> * @param collection the collection whose contents are the target of the
> #retailAll operation
> * @param retain the collection containing the elements to be retained
> in the returned collection
> * @return a <code>Collection</code> containing all the elements of
> <code>collection</code>
> * that occur at least once in <code>retain</code>.
> * @throws NullPointerException if either parameter is null
> * @since 3.2
> */
> public static <C> Collection<C> retainAll(final Collection<C> collection,
> final Collection<?> retain) {
> return ListUtils.retainAll(collection, retain);
> }
> =======================================
> We can notice the inefficiency by looking at the retainAll method in
> ListUtils.
> The retainAll method from ListUtils is implemented and documented as follows:
> =======================================
> /**
> * Returns a List containing all the elements in <code>collection</code>
> * that are also in <code>retain</code>. The cardinality of an element
> <code>e</code>
> * in the returned list is the same as the cardinality of <code>e</code>
> * in <code>collection</code> unless <code>retain</code> does not contain
> <code>e</code>, in which
> * case the cardinality is zero. This method is useful if you do not wish
> to modify
> * the collection <code>c</code> and thus cannot call
> <code>collection.retainAll(retain);</code>.
> * <p>
> * This implementation iterates over <code>collection</code>, checking
> each element in
> * turn to see if it's contained in <code>retain</code>. If it's
> contained, it's added
> * to the returned list. As a consequence, it is advised to use a
> collection type for
> * <code>retain</code> that provides a fast (e.g. O(1)) implementation of
> * {@link Collection#contains(Object)}.
> *
> * @param <E> the element type
> * @param collection the collection whose contents are the target of the
> #retailAll operation
> * @param retain the collection containing the elements to be retained
> in the returned collection
> * @return a <code>List</code> containing all the elements of
> <code>c</code>
> * that occur at least once in <code>retain</code>.
> * @throws NullPointerException if either parameter is null
> * @since 3.2
> */
> public static <E> List<E> retainAll(final Collection<E> collection, final
> Collection<?> retain) {
> final List<E> list = new ArrayList<E>(Math.min(collection.size(),
> retain.size()));
> for (final E obj : collection) {
> if (retain.contains(obj)) {
> list.add(obj);
> }
> }
> return list;
> }
> {noformat}
> In the case of ListUtils:retainAll, the inefficiency is properly documented.
> Perhaps the disclaimer about potential inefficiencies depending on the type
> of the parameter collection in ListUtils:retainAll should also be included in
> CollectionUtils:retainAll.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)