[
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Neidhart updated COLLECTIONS-548:
----------------------------------------
Affects Version/s: (was: 4.1)
4.0
> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
> Key: COLLECTIONS-548
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
> Project: Commons Collections
> Issue Type: Bug
> Components: Collection
> Affects Versions: 4.0
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Minor
> Labels: performance
>
> There seems to be a performance problem with the function retainAll of
> CompositeCollection. This is analogous to
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>
> /**
> * Retains all the elements in the specified collection in this composite
> collection,
> * removing all others.
> * <p>
> * This implementation calls <code>retainAll()</code> on each collection.
> *
> * @param coll the collection to remove
> * @return true if the collection was modified
> * @throws UnsupportedOperationException if retainAll is unsupported
> */
> public boolean retainAll(final Collection<?> coll) {
> boolean changed = false;
> for (final Collection<E> item : all) {
> changed |= item.retainAll(coll);
> }
> return changed;
> }
> {code}
> The performance problem occurs when the underlying collections in the current
> collection have a slow retainAll method. Whenever we're relying on
> Collection::retainAll, slow cases tend to occur when the parameter collection
> has a slow contains method.
> The following test harness shows the performance degradation between
> using a list and using a set as a parameter, across different collection
> sizes.
> {code}
> public static void compositeCollectionRetainAllTest(boolean original) {
> int N=500000;
> ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> for(int i=0;i<N;++i) {
> firstArrayList.add(new Integer(i));
> secondArrayList.add(new Integer(N-1-i));
>
> }
> CompositeCollection col = new CompositeCollection(firstArrayList);
> col.retainAll(original ? secondArrayList : (new
> HashSet<Integer>(secondArrayList)));
>
> }
> {code}
> In the following table "Original" corresponds to the time taken by
> the existing implementation of CompositeCollection::retainAll, "Repaired" to
> the time taken by the function invoked with the parameter converted into a
> set, and "Speed-up" to the speed-up obtained after the repair.
> N Original(s) Repaired(s) Speed-up(X)
> 10 1.04 1.04 1.00
> 100 1.04 1.05 0.99
> 1000 1.06 1.06 1.00
> 10000 1.12 1.10 1.01
> 100000 9.34 1.15 8.12
> 500000 > 300 1.29 > 232.55
> If it's unacceptable to convert the parameter into a set before calling
> retainsAll, a solution would be to include a warning to the user in the
> documentation that the parameter should have a fast contains method when
> possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)