[
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290679#comment-14290679
]
Thomas Neidhart commented on COLLECTIONS-548:
---------------------------------------------
The documentation clearly states what the method is doing:
{noformat}
* This implementation calls <code>retainAll()</code> on each collection.
{noformat}
The retainAll() method of the Collection interface is also well-known and by
default (see AbstractCollection) calls contains() on the provided collection.
Thus users should be aware of this by now (2015). If a user is really calling
retainAll() with a huge list, it's probably better to put the elements in a set
and provide this as an argument to retainAll().
My whole point is that there's no use in providing uber-collection types that
have an optimal runtime-complexity in all cases but with the trade-off of
additional memory requirements. Users have to chose and use proper collection
types for their use case.
> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
> Key: COLLECTIONS-548
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
> Project: Commons Collections
> Issue Type: Bug
> Components: Collection
> Affects Versions: 4.0
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Minor
> Labels: performance
>
> There seems to be a performance problem with the function retainAll of
> CompositeCollection. This is analogous to
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>
> /**
> * Retains all the elements in the specified collection in this composite
> collection,
> * removing all others.
> * <p>
> * This implementation calls <code>retainAll()</code> on each collection.
> *
> * @param coll the collection to remove
> * @return true if the collection was modified
> * @throws UnsupportedOperationException if retainAll is unsupported
> */
> public boolean retainAll(final Collection<?> coll) {
> boolean changed = false;
> for (final Collection<E> item : all) {
> changed |= item.retainAll(coll);
> }
> return changed;
> }
> {code}
> The performance problem occurs when the underlying collections in the current
> collection have a slow retainAll method. Whenever we're relying on
> Collection::retainAll, slow cases tend to occur when the parameter collection
> has a slow contains method.
> The following test harness shows the performance degradation between
> using a list and using a set as a parameter, across different collection
> sizes.
> {code}
> public static void compositeCollectionRetainAllTest(boolean original) {
> int N=500000;
> ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> for(int i=0;i<N;++i) {
> firstArrayList.add(new Integer(i));
> secondArrayList.add(new Integer(N-1-i));
>
> }
> CompositeCollection col = new CompositeCollection(firstArrayList);
> col.retainAll(original ? secondArrayList : (new
> HashSet<Integer>(secondArrayList)));
>
> }
> {code}
> In the following table "Original" corresponds to the time taken by
> the existing implementation of CompositeCollection::retainAll, "Repaired" to
> the time taken by the function invoked with the parameter converted into a
> set, and "Speed-up" to the speed-up obtained after the repair.
> N Original(s) Repaired(s) Speed-up(X)
> 10 1.04 1.04 1.00
> 100 1.04 1.05 0.99
> 1000 1.06 1.06 1.00
> 10000 1.12 1.10 1.01
> 100000 9.34 1.15 8.12
> 500000 > 300 1.29 > 232.55
> If it's unacceptable to convert the parameter into a set before calling
> retainsAll, a solution would be to include a warning to the user in the
> documentation that the parameter should have a fast contains method when
> possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)