[
https://issues.apache.org/jira/browse/LANG-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472455#comment-13472455
]
Sebb commented on LANG-839:
---------------------------
Further testing shows that the approach used in the original patch - i.e. using
a version of removeAll() that processes the BitSet directly - is generally
faster than using the original removeAll() method after converting the BitSet
to int[]
Win XP:
Ratio=92% array=100 count=1 extract=10314998 bitset=9571607
Ratio=68% array=100 count=10 extract=14912510 bitset=10283430
Ratio=15% array=100 count=50 extract=50206660 bitset=8017779
Ratio=8% array=100 count=100 extract=92868228 bitset=7494807
Ratio=86% array=1000 count=10 extract=42377453 bitset=36508272
Ratio=28% array=1000 count=100 extract=124472803 bitset=35606481
Ratio=4% array=1000 count=500 extract=570030828 bitset=24349463
Ratio=1% array=1000 count=1000 extract=1099601765 bitset=12346262
Continuum:
Ratio=76% array=100 count=1 extract=2948847 bitset=2257111
Ratio=32% array=100 count=10 extract=4860676 bitset=1589708
Ratio=6% array=100 count=50 extract=17143953 bitset=1160451
Ratio=1% array=100 count=100 extract=29390021 bitset=449595
Ratio=87% array=1000 count=10 extract=16487025 bitset=14461313
Ratio=30% array=1000 count=100 extract=42920962 bitset=13228312
Ratio=4% array=1000 count=500 extract=199373015 bitset=9112329
Ratio=0% array=1000 count=1000 extract=387091985 bitset=1126133
> ArrayUtils removeElements methods use unnecessary HashSet
> ---------------------------------------------------------
>
> Key: LANG-839
> URL: https://issues.apache.org/jira/browse/LANG-839
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.*
> Affects Versions: 3.1
> Reporter: Sebb
> Priority: Minor
> Fix For: 3.2
>
> Attachments: LANG-839.patch
>
>
> The removeElements() methods use a HashSet to collect the indexes that need
> removing.
> This requires creating Integer objects for each index, and the HashSet then
> has to be converted into an int[] array.
> It would be more efficient to store the entries in an actual int[] array.
> The maximum size of this is the length of the values array (or the length of
> the input array if that is shorter).
> The array must be truncated before calling the private removeAll() method;
> this can be done with Arrays.copyOf(x[], length).
> However, if the arrays are very large, and most of the values do not appear
> in the input, this might result in using more memory than the HashSet
> implementation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira