Oswaldo Olivo created COLLECTIONS-549:
-----------------------------------------

             Summary: Linear time implementation of retainAll for 
CollectionUtils
                 Key: COLLECTIONS-549
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-549
             Project: Commons Collections
          Issue Type: Improvement
          Components: Collection
    Affects Versions: 4.1
         Environment: Ubuntu 14.04
            Reporter: Oswaldo Olivo


CollectionUtils provides a linear time implementation of containsAll using 
additional linear space.

There is a linear time implementation of retainAll as follows:

{code}

 // Eliminates from coll1 all elements that are not in coll2.
    // It runs in O(m+n) size, requiring additional O(m) space.
    public static boolean efficientRetainAll(final Collection<?> coll1,final 
Collection<?> coll2) {

        // If coll2 is empty there are no elements to retain.
        if(coll2.isEmpty()) {
            return coll1.removeAll(coll1);
        }

        // Simple case when we're supposed to retain all elements
        // in the first collection.
        if(coll1==coll2)
            return false;
        
        // it1 iterates over coll1 and it2 iterares over coll2,
        // and seen contains all the elements before it2, allowing 
        // us to never revisit previous elements.
        // The algorithm iterates through it1, checks to see if we've 
        // already seen the current element of it1 via a Hashset 
        // efficient check, or traverses elements of it2 until we find 
        // it or it2 ends. At each iteration over it2 we add the
        // elements to seen to avoid revisiting items.
        Iterator<?> it1 = coll1.iterator();
        Iterator<?> it2 = coll2.iterator();
        HashSet<Object> seen = new HashSet<Object>();
        boolean changed=false;
        // Traverse all the elements in coll1.
        while(it1.hasNext()) {
            final Object o=it1.next();
            // If we've seen this element in coll2, keep it.
            if(seen.contains(o))
                continue;
            // Otherwise, check for its containment in coll2, while
            // adding the elements to seen.
            boolean contained=false;
            while(it2.hasNext()) {
                final Object o2=it2.next();
                seen.add(o2);
                // Found the element in coll2.
                if(o2.equals(o)) {
                    contained=true;
                    break;
                }
            }
            // If the element was not found in coll2, remove it from it1.
            if(!contained) {
                changed=true;
                it1.remove();
            }

        }
        

        return changed;
    }

{code}

Besides providing this in CollectionUtils for the user, this function could be 
beneficial within other functions, where the existing retainAll exhibits a 
significant slowdown for large collections:

1) https://issues.apache.org/jira/browse/COLLECTIONS-534
2) https://issues.apache.org/jira/browse/COLLECTIONS-548





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to