[ 
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290676#comment-14290676
 ] 

Oswaldo Olivo commented on COLLECTIONS-548:
-------------------------------------------

Hi Thomas,

It's understandable that changing the code might be undesirable.

But shouldn't there be a warning to the user in the documentation about the 
pathological cases in the invoked function (contains in other cases, retainAll 
in this one) ? 

In the case of AbstracLinkedList::retainAll, you would need to peek at the 
contains method implementation to see the performance impact, but in 
CompositeCollection::retainAll you would probably need to look at the 
implementation of retainAll of the underlying collections, and then the 
contains of these collections. 
More layers of function invocations w.r.t. the problematic method are more 
likely to obscure the performance bottleneck, making a stronger case to 
document the  higher level functions.

> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to 
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite 
> collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current 
> collection have a slow retainAll method. Whenever we're relying on 
> Collection::retainAll, slow cases tend to occur when the parameter collection 
> has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection 
> sizes.
> {code}
>  public static void   compositeCollectionRetainAllTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer(N-1-i));
>           
>       }
>       CompositeCollection col = new CompositeCollection(firstArrayList);
>       col.retainAll(original ? secondArrayList : (new 
> HashSet<Integer>(secondArrayList)));
>       
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to 
> the time taken by the function invoked with the parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
> N             Original(s)     Repaired(s)     Speed-up(X)
> 10             1.04           1.04                    1.00
> 100            1.04           1.05                    0.99
> 1000  1.06                    1.06                    1.00
> 10000 1.12                    1.10                    1.01
> 100000        9.34                    1.15                    8.12
> 500000        > 300           1.29                    > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the 
> documentation that the parameter should have a fast contains method when 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to