[ 
https://issues.apache.org/jira/browse/COLLECTIONS-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Neidhart updated COLLECTIONS-547:
----------------------------------------
    Affects Version/s:     (was: 4.1)
                       4.0

> Performance issue in SetUtils::isEqualSet
> -----------------------------------------
>
>                 Key: COLLECTIONS-547
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-547
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Set
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: perfomance
>
> There seems to be a performance problem with the function isEqualSet of 
> SetUtils when the first parameter is of a collection type that has a slow 
> containsAll/contains method.
> The following is the code of the function:
> {code}
>     /**
>      * Tests two sets for equality as per the <code>equals()</code> contract
>      * in {@link java.util.Set#equals(java.lang.Object)}.
>      * <p>
>      * This method is useful for implementing <code>Set</code> when you cannot
>      * extend AbstractSet. The method takes Collection instances to enable 
> other
>      * collection types to use the Set implementation algorithm.
>      * <p>
>      * The relevant text (slightly paraphrased as this is a static method) is:
>      * <blockquote>
>      * <p>Two sets are considered equal if they have
>      * the same size, and every member of the first set is contained in
>      * the second. This ensures that the <tt>equals</tt> method works
>      * properly across different implementations of the <tt>Set</tt>
>      * interface.</p>
>      *
>      * <p>
>      * This implementation first checks if the two sets are the same object:
>      * if so it returns <tt>true</tt>.  Then, it checks if the two sets are
>      * identical in size; if not, it returns false. If so, it returns
>      * <tt>a.containsAll((Collection) b)</tt>.</p>
>      * </blockquote>
>      *
>      * @see java.util.Set
>      * @param set1  the first set, may be null
>      * @param set2  the second set, may be null
>      * @return whether the sets are equal by value comparison
>      */
>     public static boolean isEqualSet(final Collection<?> set1, final 
> Collection<?> set2) {
>         if (set1 == set2) {
>             return true;
>         }
>         if (set1 == null || set2 == null || set1.size() != set2.size()) {
>             return false;
>         }
>         return set1.containsAll(set2);
>     }
> {code}
> The problem is that in the last return statement, the function relies on the 
> containsAll method of the class of the set1, which can be any type of 
> collection.
> The following test harness shows the performance degradation between 
> using a list and using a set as a first parameter, across different 
> collection sizes.
> {code}
>     public static void        setUtilsisEqualSetTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer((N-1)-i));
>           
>       }
>       SetUtils.isEqualSet(original?firstArrayList:(new 
> HashSet<Integer>(firstArrayList)),secondArrayList);
>       
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of SetUtils::isEqualSet, "Repaired" to the time 
> taken by the function invoked with the first parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
>        N      Original(s)     Repaired(s)     Speed-up(X)
>       10                1.01              1.02                     0.99
>      100                1.02              1.02                     1
>     1000                1.04              1.04                     1
>    10000                1.15              1.09                     1.05
>   100000                9.33              1.11                     8.40
>   500000              > 300               1.26                     > 238.09
> One way to deal with this would be to call the CollectionUtils::containsAll 
> method instead of Collection::containsAll, since it has a linear time 
> implementation instead of quadratic, and can handle the types of isEqualSet.
> Another solution would be to include a warning to the user in the 
> documentation that the first parameter should have a fast containment method 
> when possible.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to