[jira] Closed: (COLLECTIONS-328) ListUtils.intersect is slow

Henri Yandell (JIRA) Sun, 18 Jul 2010 01:16:51 -0700

     [ 
https://issues.apache.org/jira/browse/COLLECTIONS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Henri Yandell closed COLLECTIONS-328.
-------------------------------------

    Fix Version/s: 4.0-beta-1
                       (was: 4.0)
       Resolution: Fixed

Patch applied, thanks Thomas.

I tweaked it slightly to work with COLLECTIONS-359, thus the identified 
intersection is removed from the hash set to avoid duplicates.

It feels to me that cpu performance would increase if we built the hashset on 
the larger of the datasets as we would end up walking the smaller list, but it 
also feels that that would hurt memory more often.

svn ci -m "Applying Thomas Rogan's patch to COLLECTIONS-328, improving the 
performance to ListUtils.intersection in the manner described by Jilles van 
Gurp"
Sending        src/java/org/apache/commons/collections/ListUtils.java
Transmitting file data .
Committed revision 965176.


> ListUtils.intersect is slow
> ---------------------------
>
>                 Key: COLLECTIONS-328
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-328
>             Project: Commons Collections
>          Issue Type: Improvement
>            Reporter: Jilles van Gurp
>             Fix For: 4.0-beta-1
>
>         Attachments: IntersectHashSet.patch
>
>
> ListUtils.intersect is quite slow and can be improved by using a HashSet for 
> the contains operation which cuts the complexity from n^2 to n. I ran into 
> this by intersecting two lists with a few hundred thousand elements.
> current:
> public static List intersection(final List list1, final List list2) {
>         final ArrayList result = new ArrayList();
>         final Iterator iterator = list2.iterator();
>         while (iterator.hasNext()) {
>             final Object o = iterator.next();
>             if (list1.contains(o)) {
>                 result.add(o);
>             }
>         }
>         return result;
> Basically would work by inserting list1 into a HashSet like this:
> Set objs = new HashSet();
> objs.addAll(list1);
> and then instead of list1.contains do a objs.contains
> Further performance can be gained by picking the smallest list for this with 
> a simple size comparison.
> BTW what is this method supposed to do with duplicate entries in lists? 
> Semantics are not really clear here as opposed to set intersection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (COLLECTIONS-328) ListUtils.intersect is slow

Reply via email to