[ 
https://issues.apache.org/jira/browse/COLLECTIONS-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Barnes updated COLLECTIONS-433:
---------------------------------------

    Attachment: COLLECTIONS-433.patch

I have developed a patch that greatly improves the performance of 
{{TreeList.addAll(Collection)}}. The theoretical complexity is improved from 
_O_(_n_ log _m_) to _O_(_n_ + log _m_), where _m_ is the size of the 
{{TreeList}} and _n_ is the size of the collection to be added, and performance 
tests show a significant practical gain.

The Stack Overflow question that Adrian Nistor linked is informative but not 
the ideal algorithm to use here, as it handles the case where there are two 
arbitrary (sorted) AVL trees that must be merged into one (sorted) AVL tree. 
Because the elements of the two AVL trees may need to be interleaved in 
arbitrary ways, the best we can do in that situation is to flatten the trees 
into lists, merge them, and construct a new AVL tree, as the Stack Overflow 
answer says.

The situation here, however, is somewhat different, because we know that all 
the elements of the new tree come to the right of the original tree. (The AVL 
tree that backs {{TreeList}} is keyed by list index, not by element value.) 
Thus, a more on-point Stack Overflow question is [this 
one|http://stackoverflow.com/questions/2037212/concatenating-merging-joining-two-avl-trees],
 which deals with the question of merging/concatenating two AVL trees when the 
elements of the first tree are known to have smaller keys than those of the 
second tree. In this case, we can merge the trees in logarithmic time! The 
algorithm, outlined by user meriton on Stack Overflow, is:
# Determine the height of both trees. Assume the right tree is taller. (The 
other case is symmetric.)
# Remove the max element, _x_, from the left tree.
# In the right tree, navigate left until you reach the subtree, _s_, that is no 
taller than the left tree.
# Replace that subtree with a new subtree whose root is _x_, whose left subtree 
is the left tree, and whose right subtree is _s_.
# Rebalance.

This is a destructive operation, so to satisfy the contract for 
{{addAll(Collection)}}, we have to first copy the argument to a new tree, which 
takes _O_(_n_). But the good news is we can do this in _O_(_n_) for any 
collection, not just a {{TreeList}}! So now {{addAll(Collection)}} has 
complexity _O_(_n_ + log _m_) for _any_ collection, regardless of whether it is 
a {{TreeList}}.

To accomplish linear conversion of collections to trees, I also reimplemented 
the constructor {{TreeList(Collection)}}, which previously just invoked 
{{addAll}}. The algorithm for converting a sorted list to an AVL tree in 
_O_(_n_) time is given 
[here|http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html].

Here are results of some performance tests where I used {{TreeList.addAll}} to 
merge two collections of size _n_ for varying _n_ (averaged over 50 runs):

||n||old||new||
|160000|0.190|0.007|
|320000|0.407|0.015|
|480000|0.615|0.108|
|640000|0.853|0.142|
|800000|1.075|0.140|
|960000|1.315|0.144|
|1120000|1.617|0.158|
|1280000|1.871|0.248|
|1440000|2.109|0.258|
|1600000|2.316|0.173|
|1760000|2.559|0.288|
|1920000|2.852|0.292|

I have tested this patch rather extensively to ensure its correctness. I 
subjected it to a test battery in which I randomly applied hundreds of 
thousands of arbitrary operations to various {{TreeLists}}, checking list 
invariants and contents after each test, so I am pretty confident in the 
correctness of this new code.

I encountered an unrelated bug in {{TreeListIterator}} which I have also fixed 
as part of this patch. Use of the {{remove}} operation could, for certain tree 
structures, cause the iterator to enter a bad state and return incorrect 
results. I include it with this patch because it was necessary to get a unit 
test to pass. (My changes to {{addAll}} changed the structure of a tree 
constructed in a unit test in a way that happened to elicit the bad behavior.) 
This change is minor compared with the substantial changes to {{addAll}} and 
the {{TreeList(Collection)}} constructor, but let me know if you'd like me to 
say more on this point.

Incidentally, I have not touched {{addAll(int, Collection)}}, the overload of 
{{addAll}} that adds the contents of a collection at a specified index, rather 
than at the end of the list. I believe that could be done efficiently too, but 
it would require a different algorithm.

This is my first contribution, so I hope I did things right. Please be gentle! 
:)
                
> TreeList.addAll() complexity
> ----------------------------
>
>                 Key: COLLECTIONS-433
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-433
>             Project: Commons Collections
>          Issue Type: Improvement
>    Affects Versions: 3.2.1
>            Reporter: Adrian Nistor
>         Attachments: COLLECTIONS-433.patch
>
>
> "TreeList.addAll(Collection coll)" has a higher complexity than
> necessary when "coll" is a "TreeList" object (because "addAll" just
> adds one element at a time).  This can be done in just O(N) as
> described for example here:
> http://stackoverflow.com/questions/4458489/merging-2-diferent-avl-trees
> Are there any plans to improve this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to