[
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oswaldo Olivo updated COLLECTIONS-548:
--------------------------------------
Description:
There seems to be a performance problem with the function retainAll of
CompositeCollection. This is analogous to
https://issues.apache.org/jira/browse/COLLECTIONS-534 .
The following is the code of the function:
{code}
/**
* Retains all the elements in the specified collection in this composite
collection,
* removing all others.
* <p>
* This implementation calls <code>retainAll()</code> on each collection.
*
* @param coll the collection to remove
* @return true if the collection was modified
* @throws UnsupportedOperationException if retainAll is unsupported
*/
public boolean retainAll(final Collection<?> coll) {
boolean changed = false;
for (final Collection<E> item : all) {
changed |= item.retainAll(coll);
}
return changed;
}
{code}
The performance problem occurs when the underlying collections in the current
collection have a slow retainAll method. Whenever we're relying on
Collection::retainAll, slow cases tend to occur when the parameter collection
has a slow contains method.
The following test harness shows the performance degradation between
using a list and using a set as a parameter, across different collection sizes.
{code}
public static void compositeCollectionRetainAllTest(boolean original) {
int N=500000;
ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
for(int i=0;i<N;++i) {
firstArrayList.add(new Integer(i));
secondArrayList.add(new Integer(N-1-i));
}
CompositeCollection col = new CompositeCollection(firstArrayList);
col.retainAll(original ? secondArrayList : (new
HashSet<Integer>(secondArrayList)));
}
{code}
In the following table "Original" corresponds to the time taken by
the existing implementation of CompositeCollection::retainAll, "Repaired" to
the time taken by the function invoked with the parameter converted into a set,
and "Speed-up" to the speed-up obtained after the repair.
N Original(s) Repaired(s) Speed-up(X)
10 1.04 1.04 1.00
100 1.04 1.05 0.99
1000 1.06 1.06 1.00
10000 1.12 1.10 1.01
100000 9.34 1.15 8.12
500000 > 300 1.29 > 232.55
If it's unacceptable to convert the parameter into a set before calling
retainsAll, a solution would be to include a warning to the user in the
documentation that the parameter should have a fast contains method when
possible.
was:
There seems to be a performance problem with the function retainAll of
CompositeCollection.
The following is the code of the function:
{code}
/**
* Retains all the elements in the specified collection in this composite
collection,
* removing all others.
* <p>
* This implementation calls <code>retainAll()</code> on each collection.
*
* @param coll the collection to remove
* @return true if the collection was modified
* @throws UnsupportedOperationException if retainAll is unsupported
*/
public boolean retainAll(final Collection<?> coll) {
boolean changed = false;
for (final Collection<E> item : all) {
changed |= item.retainAll(coll);
}
return changed;
}
{code}
The performance problem occurs when the underlying collections in the current
collection have a slow retainAll method. Whenever we're relying on
Collection::retainAll, slow cases tend to occur when the parameter collection
has a slow contains method.
The following test harness shows the performance degradation between
using a list and using a set as a parameter, across different collection sizes.
{code}
public static void compositeCollectionRetainAllTest(boolean original) {
int N=500000;
ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
for(int i=0;i<N;++i) {
firstArrayList.add(new Integer(i));
secondArrayList.add(new Integer(N-1-i));
}
CompositeCollection col = new CompositeCollection(firstArrayList);
col.retainAll(original ? secondArrayList : (new
HashSet<Integer>(secondArrayList)));
}
{code}
In the following table "Original" corresponds to the time taken by
the existing implementation of CompositeCollection::retainAll, "Repaired" to
the time taken by the function invoked with the parameter converted into a set,
and "Speed-up" to the speed-up obtained after the repair.
N Original(s) Repaired(s) Speed-up(X)
10 1.04 1.04 1.00
100 1.04 1.05 0.99
1000 1.06 1.06 1.00
10000 1.12 1.10 1.01
100000 9.34 1.15 8.12
500000 > 300 1.29 > 232.55
If it's unacceptable to convert the parameter into a set before calling
retainsAll, a solution would be to include a warning to the user in the
documentation that the parameter should have a fast contains method when
possible.
> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
> Key: COLLECTIONS-548
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
> Project: Commons Collections
> Issue Type: Bug
> Components: Collection
> Affects Versions: 4.1
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Minor
> Labels: performance
>
> There seems to be a performance problem with the function retainAll of
> CompositeCollection. This is analogous to
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>
> /**
> * Retains all the elements in the specified collection in this composite
> collection,
> * removing all others.
> * <p>
> * This implementation calls <code>retainAll()</code> on each collection.
> *
> * @param coll the collection to remove
> * @return true if the collection was modified
> * @throws UnsupportedOperationException if retainAll is unsupported
> */
> public boolean retainAll(final Collection<?> coll) {
> boolean changed = false;
> for (final Collection<E> item : all) {
> changed |= item.retainAll(coll);
> }
> return changed;
> }
> {code}
> The performance problem occurs when the underlying collections in the current
> collection have a slow retainAll method. Whenever we're relying on
> Collection::retainAll, slow cases tend to occur when the parameter collection
> has a slow contains method.
> The following test harness shows the performance degradation between
> using a list and using a set as a parameter, across different collection
> sizes.
> {code}
> public static void compositeCollectionRetainAllTest(boolean original) {
> int N=500000;
> ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> for(int i=0;i<N;++i) {
> firstArrayList.add(new Integer(i));
> secondArrayList.add(new Integer(N-1-i));
>
> }
> CompositeCollection col = new CompositeCollection(firstArrayList);
> col.retainAll(original ? secondArrayList : (new
> HashSet<Integer>(secondArrayList)));
>
> }
> {code}
> In the following table "Original" corresponds to the time taken by
> the existing implementation of CompositeCollection::retainAll, "Repaired" to
> the time taken by the function invoked with the parameter converted into a
> set, and "Speed-up" to the speed-up obtained after the repair.
> N Original(s) Repaired(s) Speed-up(X)
> 10 1.04 1.04 1.00
> 100 1.04 1.05 0.99
> 1000 1.06 1.06 1.00
> 10000 1.12 1.10 1.01
> 100000 9.34 1.15 8.12
> 500000 > 300 1.29 > 232.55
> If it's unacceptable to convert the parameter into a set before calling
> retainsAll, a solution would be to include a warning to the user in the
> documentation that the parameter should have a fast contains method when
> possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)