Oswaldo Olivo created COLLECTIONS-534:
-----------------------------------------
Summary: Performance bug in CollectionBag::retainAll
Key: COLLECTIONS-534
URL: https://issues.apache.org/jira/browse/COLLECTIONS-534
Project: Commons Collections
Issue Type: Bug
Components: Bag, Collection
Affects Versions: 4.0
Environment: Ubuntu 12.04
Reporter: Oswaldo Olivo
Hi,
There seems to be a performance bug in the method retainAll in the
CollectionBag class.
The problem is that the code is checking containment over the parameter
collection, which could be expensive for some types of collections like
ArrayLists.
One solution could be to convert the Collection into a HashSet and check
containment in the HashSet.
If this is done, then running retainAll on a 1,000,000 collection would take
less than 2 seconds instead of 27 mins, according to my experiments.
____________________________________________
Here's a function to expose the bug:
public static void collectionBagRetainAllTest() {
ArrayList<Integer> coll=new ArrayList<Integer>();
for(int i=0;i<=1000000;++i)
coll.add(new Integer(i));
TreeBag<Integer> treeBag=new TreeBag<Integer>(coll);
CollectionBag<Integer> bag = new CollectionBag<Integer>(treeBag);
bag.retainAll(coll);
}
_____________________________________
Here's a proposed patch:
public boolean retainAll(final Collection<?> coll) {
if (coll != null) {
boolean modified = false;
final Iterator<E> e = iterator();
HashSet<Object> set=new HashSet<Object>(coll);
while (e.hasNext()) {
if (!set.contains(e.next())) {
e.remove();
modified = true;
}
}
return modified;
} else {
// let the decorated bag handle the case of null argument
return decorated().retainAll(null);
}
}
_____________________________________
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)