Oswaldo Olivo created COLLECTIONS-548:
-----------------------------------------

             Summary: Performance issue in CompositeCollection::retainAll
                 Key: COLLECTIONS-548
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
             Project: Commons Collections
          Issue Type: Bug
          Components: Collection
    Affects Versions: 4.1
         Environment: Ubuntu 14.04
            Reporter: Oswaldo Olivo
            Priority: Minor


There seems to be a performance problem with the function retainAll of 
CompositeCollection.

The following is the code of the function:

{code}
 
    /**
     * Retains all the elements in the specified collection in this composite 
collection,
     * removing all others.
     * <p>
     * This implementation calls <code>retainAll()</code> on each collection.
     *
     * @param coll  the collection to remove
     * @return true if the collection was modified
     * @throws UnsupportedOperationException if retainAll is unsupported
     */
    public boolean retainAll(final Collection<?> coll) {
        boolean changed = false;
        for (final Collection<E> item : all) {
            changed |= item.retainAll(coll);
        }
        return changed;
    }


{code}

The performance problem occurs when the underlying collections in the current 
collection have a slow retainAll method. Whenever we're relying on 
Collection::retainAll, slow cases tend to occur when the parameter collection 
has a slow contains method.

The following test harness shows the performance degradation between 
using a list and using a set as a parameter, across different collection sizes.

{code}
 public static void     compositeCollectionRetainAllTest(boolean original) {

        int N=500000;

        ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
        ArrayList<Integer> secondArrayList=new ArrayList<Integer>();

        for(int i=0;i<N;++i) {

            firstArrayList.add(new Integer(i));
            secondArrayList.add(new Integer(N-1-i));
            
        }
        CompositeCollection col = new CompositeCollection(firstArrayList);

        col.retainAll(original ? secondArrayList : (new 
HashSet<Integer>(secondArrayList)));
        

    }


{code}

In the following table "Original" corresponds to the time taken by 
the existing implementation of CompositeCollection::retainAll, "Repaired" to 
the time taken by the function invoked with the parameter converted into a set, 
and "Speed-up" to the speed-up obtained after the repair.

N               Original(s)     Repaired(s)     Speed-up(X)

10               1.04           1.04                    1.00
100              1.04           1.05                    0.99
1000    1.06                    1.06                    1.00
10000   1.12                    1.10                    1.01
100000  9.34                    1.15                    8.12
500000  > 300           1.29                    > 232.55


If it's unacceptable to convert the parameter into a set before calling 
retainsAll, a solution would be to include a warning to the user in the 
documentation that the parameter should have a fast contains method when 
possible.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to