Oswaldo Olivo created COLLECTIONS-549:
-----------------------------------------
Summary: Linear time implementation of retainAll for
CollectionUtils
Key: COLLECTIONS-549
URL: https://issues.apache.org/jira/browse/COLLECTIONS-549
Project: Commons Collections
Issue Type: Improvement
Components: Collection
Affects Versions: 4.1
Environment: Ubuntu 14.04
Reporter: Oswaldo Olivo
CollectionUtils provides a linear time implementation of containsAll using
additional linear space.
There is a linear time implementation of retainAll as follows:
{code}
// Eliminates from coll1 all elements that are not in coll2.
// It runs in O(m+n) size, requiring additional O(m) space.
public static boolean efficientRetainAll(final Collection<?> coll1,final
Collection<?> coll2) {
// If coll2 is empty there are no elements to retain.
if(coll2.isEmpty()) {
return coll1.removeAll(coll1);
}
// Simple case when we're supposed to retain all elements
// in the first collection.
if(coll1==coll2)
return false;
// it1 iterates over coll1 and it2 iterares over coll2,
// and seen contains all the elements before it2, allowing
// us to never revisit previous elements.
// The algorithm iterates through it1, checks to see if we've
// already seen the current element of it1 via a Hashset
// efficient check, or traverses elements of it2 until we find
// it or it2 ends. At each iteration over it2 we add the
// elements to seen to avoid revisiting items.
Iterator<?> it1 = coll1.iterator();
Iterator<?> it2 = coll2.iterator();
HashSet<Object> seen = new HashSet<Object>();
boolean changed=false;
// Traverse all the elements in coll1.
while(it1.hasNext()) {
final Object o=it1.next();
// If we've seen this element in coll2, keep it.
if(seen.contains(o))
continue;
// Otherwise, check for its containment in coll2, while
// adding the elements to seen.
boolean contained=false;
while(it2.hasNext()) {
final Object o2=it2.next();
seen.add(o2);
// Found the element in coll2.
if(o2.equals(o)) {
contained=true;
break;
}
}
// If the element was not found in coll2, remove it from it1.
if(!contained) {
changed=true;
it1.remove();
}
}
return changed;
}
{code}
Besides providing this in CollectionUtils for the user, this function could be
beneficial within other functions, where the existing retainAll exhibits a
significant slowdown for large collections:
1) https://issues.apache.org/jira/browse/COLLECTIONS-534
2) https://issues.apache.org/jira/browse/COLLECTIONS-548
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)