[ 
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oswaldo Olivo updated COLLECTIONS-548:
--------------------------------------
    Description: 
There seems to be a performance problem with the function retainAll of 
CompositeCollection. This is analogous to 
https://issues.apache.org/jira/browse/COLLECTIONS-534 .

The following is the code of the function:

{code}
 
    /**
     * Retains all the elements in the specified collection in this composite 
collection,
     * removing all others.
     * <p>
     * This implementation calls <code>retainAll()</code> on each collection.
     *
     * @param coll  the collection to remove
     * @return true if the collection was modified
     * @throws UnsupportedOperationException if retainAll is unsupported
     */
    public boolean retainAll(final Collection<?> coll) {
        boolean changed = false;
        for (final Collection<E> item : all) {
            changed |= item.retainAll(coll);
        }
        return changed;
    }


{code}

The performance problem occurs when the underlying collections in the current 
collection have a slow retainAll method. Whenever we're relying on 
Collection::retainAll, slow cases tend to occur when the parameter collection 
has a slow contains method.

The following test harness shows the performance degradation between 
using a list and using a set as a parameter, across different collection sizes.

{code}
 public static void     compositeCollectionRetainAllTest(boolean original) {

        int N=500000;

        ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
        ArrayList<Integer> secondArrayList=new ArrayList<Integer>();

        for(int i=0;i<N;++i) {

            firstArrayList.add(new Integer(i));
            secondArrayList.add(new Integer(N-1-i));
            
        }
        CompositeCollection col = new CompositeCollection(firstArrayList);

        col.retainAll(original ? secondArrayList : (new 
HashSet<Integer>(secondArrayList)));
        

    }


{code}

In the following table "Original" corresponds to the time taken by 
the existing implementation of CompositeCollection::retainAll, "Repaired" to 
the time taken by the function invoked with the parameter converted into a set, 
and "Speed-up" to the speed-up obtained after the repair.

N               Original(s)     Repaired(s)     Speed-up(X)

10               1.04           1.04                    1.00
100              1.04           1.05                    0.99
1000    1.06                    1.06                    1.00
10000   1.12                    1.10                    1.01
100000  9.34                    1.15                    8.12
500000  > 300           1.29                    > 232.55


If it's unacceptable to convert the parameter into a set before calling 
retainsAll, a solution would be to include a warning to the user in the 
documentation that the parameter should have a fast contains method when 
possible.


  was:
There seems to be a performance problem with the function retainAll of 
CompositeCollection.

The following is the code of the function:

{code}
 
    /**
     * Retains all the elements in the specified collection in this composite 
collection,
     * removing all others.
     * <p>
     * This implementation calls <code>retainAll()</code> on each collection.
     *
     * @param coll  the collection to remove
     * @return true if the collection was modified
     * @throws UnsupportedOperationException if retainAll is unsupported
     */
    public boolean retainAll(final Collection<?> coll) {
        boolean changed = false;
        for (final Collection<E> item : all) {
            changed |= item.retainAll(coll);
        }
        return changed;
    }


{code}

The performance problem occurs when the underlying collections in the current 
collection have a slow retainAll method. Whenever we're relying on 
Collection::retainAll, slow cases tend to occur when the parameter collection 
has a slow contains method.

The following test harness shows the performance degradation between 
using a list and using a set as a parameter, across different collection sizes.

{code}
 public static void     compositeCollectionRetainAllTest(boolean original) {

        int N=500000;

        ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
        ArrayList<Integer> secondArrayList=new ArrayList<Integer>();

        for(int i=0;i<N;++i) {

            firstArrayList.add(new Integer(i));
            secondArrayList.add(new Integer(N-1-i));
            
        }
        CompositeCollection col = new CompositeCollection(firstArrayList);

        col.retainAll(original ? secondArrayList : (new 
HashSet<Integer>(secondArrayList)));
        

    }


{code}

In the following table "Original" corresponds to the time taken by 
the existing implementation of CompositeCollection::retainAll, "Repaired" to 
the time taken by the function invoked with the parameter converted into a set, 
and "Speed-up" to the speed-up obtained after the repair.

N               Original(s)     Repaired(s)     Speed-up(X)

10               1.04           1.04                    1.00
100              1.04           1.05                    0.99
1000    1.06                    1.06                    1.00
10000   1.12                    1.10                    1.01
100000  9.34                    1.15                    8.12
500000  > 300           1.29                    > 232.55


If it's unacceptable to convert the parameter into a set before calling 
retainsAll, a solution would be to include a warning to the user in the 
documentation that the parameter should have a fast contains method when 
possible.



> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.1
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to 
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite 
> collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current 
> collection have a slow retainAll method. Whenever we're relying on 
> Collection::retainAll, slow cases tend to occur when the parameter collection 
> has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection 
> sizes.
> {code}
>  public static void   compositeCollectionRetainAllTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer(N-1-i));
>           
>       }
>       CompositeCollection col = new CompositeCollection(firstArrayList);
>       col.retainAll(original ? secondArrayList : (new 
> HashSet<Integer>(secondArrayList)));
>       
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to 
> the time taken by the function invoked with the parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
> N             Original(s)     Repaired(s)     Speed-up(X)
> 10             1.04           1.04                    1.00
> 100            1.04           1.05                    0.99
> 1000  1.06                    1.06                    1.00
> 10000 1.12                    1.10                    1.01
> 100000        9.34                    1.15                    8.12
> 500000        > 300           1.29                    > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the 
> documentation that the parameter should have a fast contains method when 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to