[ 
https://issues.apache.org/jira/browse/COLLECTIONS-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Neidhart updated COLLECTIONS-549:
----------------------------------------
    Affects Version/s:     (was: 4.1)
                       4.0

> Linear time implementation of retainAll for CollectionUtils
> -----------------------------------------------------------
>
>                 Key: COLLECTIONS-549
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-549
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>              Labels: Feature, Performance
>
> CollectionUtils provides a linear time implementation of containsAll using 
> additional linear space.
> There is a linear time implementation of retainAll as follows:
> {code}
>  // Eliminates from coll1 all elements that are not in coll2.
>     // It runs in O(m+n) size, requiring additional O(m) space.
>     public static boolean efficientRetainAll(final Collection<?> coll1,final 
> Collection<?> coll2) {
>       // If coll2 is empty there are no elements to retain.
>       if(coll2.isEmpty()) {
>           return coll1.removeAll(coll1);
>       }
>       // Simple case when we're supposed to retain all elements
>       // in the first collection.
>       if(coll1==coll2)
>           return false;
>       
>       // it1 iterates over coll1 and it2 iterares over coll2,
>       // and seen contains all the elements before it2, allowing 
>       // us to never revisit previous elements.
>       // The algorithm iterates through it1, checks to see if we've 
>       // already seen the current element of it1 via a Hashset 
>       // efficient check, or traverses elements of it2 until we find 
>       // it or it2 ends. At each iteration over it2 we add the
>       // elements to seen to avoid revisiting items.
>       Iterator<?> it1 = coll1.iterator();
>       Iterator<?> it2 = coll2.iterator();
>       HashSet<Object> seen = new HashSet<Object>();
>       boolean changed=false;
>       // Traverse all the elements in coll1.
>       while(it1.hasNext()) {
>           final Object o=it1.next();
>           // If we've seen this element in coll2, keep it.
>           if(seen.contains(o))
>               continue;
>           // Otherwise, check for its containment in coll2, while
>           // adding the elements to seen.
>           boolean contained=false;
>           while(it2.hasNext()) {
>               final Object o2=it2.next();
>               seen.add(o2);
>               // Found the element in coll2.
>               if(o2.equals(o)) {
>                   contained=true;
>                   break;
>               }
>           }
>           // If the element was not found in coll2, remove it from it1.
>           if(!contained) {
>               changed=true;
>               it1.remove();
>           }
>       }
>       
>       return changed;
>     }
> {code}
> Besides providing this in CollectionUtils for the user, this function could 
> be 
> beneficial within other functions, where the existing retainAll exhibits a 
> significant slowdown for large collections:
> 1) https://issues.apache.org/jira/browse/COLLECTIONS-534
> 2) https://issues.apache.org/jira/browse/COLLECTIONS-548



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to