[ 
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Neidhart updated COLLECTIONS-548:
----------------------------------------
    Affects Version/s:     (was: 4.1)
                       4.0

> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to 
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite 
> collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current 
> collection have a slow retainAll method. Whenever we're relying on 
> Collection::retainAll, slow cases tend to occur when the parameter collection 
> has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection 
> sizes.
> {code}
>  public static void   compositeCollectionRetainAllTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer(N-1-i));
>           
>       }
>       CompositeCollection col = new CompositeCollection(firstArrayList);
>       col.retainAll(original ? secondArrayList : (new 
> HashSet<Integer>(secondArrayList)));
>       
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to 
> the time taken by the function invoked with the parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
> N             Original(s)     Repaired(s)     Speed-up(X)
> 10             1.04           1.04                    1.00
> 100            1.04           1.05                    0.99
> 1000  1.06                    1.06                    1.00
> 10000 1.12                    1.10                    1.01
> 100000        9.34                    1.15                    8.12
> 500000        > 300           1.29                    > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the 
> documentation that the parameter should have a fast contains method when 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to