[
https://issues.apache.org/jira/browse/COLLECTIONS-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Neidhart resolved COLLECTIONS-547.
-----------------------------------------
Resolution: Not a Problem
The javadoc of the method clearly states the purpose and function of this
method.
It should only be used when the first parameter is a Set or set-like data
structure with a fast contains implementation. In other cases this method
should not be used.
Delegating this method to CollectionUtils.containsAll() is no good as it has an
additional memory overhead which is not needed when used properly.
> Performance issue in SetUtils::isEqualSet
> -----------------------------------------
>
> Key: COLLECTIONS-547
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-547
> Project: Commons Collections
> Issue Type: Bug
> Components: Set
> Affects Versions: 4.1
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Minor
> Labels: perfomance
>
> There seems to be a performance problem with the function isEqualSet of
> SetUtils when the first parameter is of a collection type that has a slow
> containsAll/contains method.
> The following is the code of the function:
> {code}
> /**
> * Tests two sets for equality as per the <code>equals()</code> contract
> * in {@link java.util.Set#equals(java.lang.Object)}.
> * <p>
> * This method is useful for implementing <code>Set</code> when you cannot
> * extend AbstractSet. The method takes Collection instances to enable
> other
> * collection types to use the Set implementation algorithm.
> * <p>
> * The relevant text (slightly paraphrased as this is a static method) is:
> * <blockquote>
> * <p>Two sets are considered equal if they have
> * the same size, and every member of the first set is contained in
> * the second. This ensures that the <tt>equals</tt> method works
> * properly across different implementations of the <tt>Set</tt>
> * interface.</p>
> *
> * <p>
> * This implementation first checks if the two sets are the same object:
> * if so it returns <tt>true</tt>. Then, it checks if the two sets are
> * identical in size; if not, it returns false. If so, it returns
> * <tt>a.containsAll((Collection) b)</tt>.</p>
> * </blockquote>
> *
> * @see java.util.Set
> * @param set1 the first set, may be null
> * @param set2 the second set, may be null
> * @return whether the sets are equal by value comparison
> */
> public static boolean isEqualSet(final Collection<?> set1, final
> Collection<?> set2) {
> if (set1 == set2) {
> return true;
> }
> if (set1 == null || set2 == null || set1.size() != set2.size()) {
> return false;
> }
> return set1.containsAll(set2);
> }
> {code}
> The problem is that in the last return statement, the function relies on the
> containsAll method of the class of the set1, which can be any type of
> collection.
> The following test harness shows the performance degradation between
> using a list and using a set as a first parameter, across different
> collection sizes.
> {code}
> public static void setUtilsisEqualSetTest(boolean original) {
> int N=500000;
> ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> for(int i=0;i<N;++i) {
> firstArrayList.add(new Integer(i));
> secondArrayList.add(new Integer((N-1)-i));
>
> }
> SetUtils.isEqualSet(original?firstArrayList:(new
> HashSet<Integer>(firstArrayList)),secondArrayList);
>
> {code}
> In the following table "Original" corresponds to the time taken by
> the existing implementation of SetUtils::isEqualSet, "Repaired" to the time
> taken by the function invoked with the first parameter converted into a
> set, and "Speed-up" to the speed-up obtained after the repair.
> N Original(s) Repaired(s) Speed-up(X)
> 10 1.01 1.02 0.99
> 100 1.02 1.02 1
> 1000 1.04 1.04 1
> 10000 1.15 1.09 1.05
> 100000 9.33 1.11 8.40
> 500000 > 300 1.26 > 238.09
> One way to deal with this would be to call the CollectionUtils::containsAll
> method instead of Collection::containsAll, since it has a linear time
> implementation instead of quadratic, and can handle the types of isEqualSet.
> Another solution would be to include a warning to the user in the
> documentation that the first parameter should have a fast containment method
> when possible.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)