[ 
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286597#comment-14286597
 ] 

Oswaldo Olivo commented on COLLECTIONS-548:
-------------------------------------------

Here's another potential fix by using a more efficient retainAll method in the 
spirit of CollectionUtils::containsAll


{code}
------------------------------------------------------------

    // Eliminates from coll1 all elements that are not in coll2.
    // It runs in O(m+n) size, requiring additional O(m) space.
    public static boolean efficientRetainAll(final Collection<?> coll1,final 
Collection<?> coll2) {

        // If coll2 is empty there are no elements to retain.
        if(coll2.isEmpty()) {
            return coll1.removeAll(coll1);
        }

        // Simple case when we're supposed to retain all elements
        // in the first collection.
        if(coll1==coll2)
            return false;
        
        // it1 iterates over coll1 and it2 iterares over coll2,
        // and seen contains all the elements before it2, allowing 
        // us to never revisit previous elements.
        // The algorithm iterates through it1, checks to see if we've 
        // already seen the current element of it1 via a Hashset 
        // efficient check, or traverses elements of it2 until we find 
        // it or it2 ends. At each iteration over it2 we add the
        // elements to seen to avoid revisiting items.
        Iterator<?> it1 = coll1.iterator();
        Iterator<?> it2 = coll2.iterator();
        HashSet<Object> seen = new HashSet<Object>();
        boolean changed=false;
        // Traverse all the elements in coll1.
        while(it1.hasNext()) {
            final Object o=it1.next();
            // If we've seen this element in coll2, keep it.
            if(seen.contains(o))
                continue;
            // Otherwise, check for its containment in coll2, while
            // adding the elements to seen.
            boolean contained=false;
            while(it2.hasNext()) {
                final Object o2=it2.next();
                seen.add(o2);
                // Found the element in coll2.
                if(o2.equals(o)) {
                    contained=true;
                    break;
                }
            }
            // If the element was not found in coll2, remove it from it1.
            if(!contained) {
                changed=true;
                it1.remove();
            }

        }
        

        return changed;
    }
{code}
----------------------


And the harness:

-----------------------
{code}

    public static void  compositeCollectionRetainAllTest(boolean original) {

        int N=500000;

        ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
        ArrayList<Integer> secondArrayList=new ArrayList<Integer>();

        for(int i=0;i<N;++i) {

            firstArrayList.add(new Integer(i));
            secondArrayList.add(new Integer(N-1-i));
            
        }
        CompositeCollection col = new CompositeCollection(firstArrayList);

        if(original)
            col.retainAll(secondArrayList);
        else 
            efficientRetainAll(col,secondArrayList);



    }
{code}

------------------------------

The results are:

N               Original(s)     Repaired(s)     Speed-up(X)

10               1.04           1.05                    0.99
100              1.04           1.05                    0.99
1000             1.06           1.06                    1.00
10000            1.12           1.10                    1.01
100000           9.34           1.15                    8.12
500000           > 300          1.34                    > 223.88


> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.1
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to 
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite 
> collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current 
> collection have a slow retainAll method. Whenever we're relying on 
> Collection::retainAll, slow cases tend to occur when the parameter collection 
> has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection 
> sizes.
> {code}
>  public static void   compositeCollectionRetainAllTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer(N-1-i));
>           
>       }
>       CompositeCollection col = new CompositeCollection(firstArrayList);
>       col.retainAll(original ? secondArrayList : (new 
> HashSet<Integer>(secondArrayList)));
>       
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to 
> the time taken by the function invoked with the parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
> N             Original(s)     Repaired(s)     Speed-up(X)
> 10             1.04           1.04                    1.00
> 100            1.04           1.05                    0.99
> 1000  1.06                    1.06                    1.00
> 10000 1.12                    1.10                    1.01
> 100000        9.34                    1.15                    8.12
> 500000        > 300           1.29                    > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the 
> documentation that the parameter should have a fast contains method when 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to