Oswaldo Olivo created COLLECTIONS-548:
-----------------------------------------
Summary: Performance issue in CompositeCollection::retainAll
Key: COLLECTIONS-548
URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
Project: Commons Collections
Issue Type: Bug
Components: Collection
Affects Versions: 4.1
Environment: Ubuntu 14.04
Reporter: Oswaldo Olivo
Priority: Minor
There seems to be a performance problem with the function retainAll of
CompositeCollection.
The following is the code of the function:
{code}
/**
* Retains all the elements in the specified collection in this composite
collection,
* removing all others.
* <p>
* This implementation calls <code>retainAll()</code> on each collection.
*
* @param coll the collection to remove
* @return true if the collection was modified
* @throws UnsupportedOperationException if retainAll is unsupported
*/
public boolean retainAll(final Collection<?> coll) {
boolean changed = false;
for (final Collection<E> item : all) {
changed |= item.retainAll(coll);
}
return changed;
}
{code}
The performance problem occurs when the underlying collections in the current
collection have a slow retainAll method. Whenever we're relying on
Collection::retainAll, slow cases tend to occur when the parameter collection
has a slow contains method.
The following test harness shows the performance degradation between
using a list and using a set as a parameter, across different collection sizes.
{code}
public static void compositeCollectionRetainAllTest(boolean original) {
int N=500000;
ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
for(int i=0;i<N;++i) {
firstArrayList.add(new Integer(i));
secondArrayList.add(new Integer(N-1-i));
}
CompositeCollection col = new CompositeCollection(firstArrayList);
col.retainAll(original ? secondArrayList : (new
HashSet<Integer>(secondArrayList)));
}
{code}
In the following table "Original" corresponds to the time taken by
the existing implementation of CompositeCollection::retainAll, "Repaired" to
the time taken by the function invoked with the parameter converted into a set,
and "Speed-up" to the speed-up obtained after the repair.
N Original(s) Repaired(s) Speed-up(X)
10 1.04 1.04 1.00
100 1.04 1.05 0.99
1000 1.06 1.06 1.00
10000 1.12 1.10 1.01
100000 9.34 1.15 8.12
500000 > 300 1.29 > 232.55
If it's unacceptable to convert the parameter into a set before calling
retainsAll, a solution would be to include a warning to the user in the
documentation that the parameter should have a fast contains method when
possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)