[
https://issues.apache.org/jira/browse/CASSANDRA-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024528#comment-15024528
]
Branimir Lambov commented on CASSANDRA-9991:
--------------------------------------------
Thank you for the patch. You are right to question the worth of saving some
work at the expense of code. I'd actually prefer to go the other way and
simplify the code more.
It's not obvious why you want to have {{remove}} separated from
{{removeFromLeft}} -- I can see that it is necessary to avoid copying /
reorganizing when the key is not in the tree, but that could be stated in a
comment to save some reader the time to figure it out. The same goes for the
need for {{swap}} to walk the whole tree again, a comment would help a lot.
I'd extract [the predecessor loop in
{{remove}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R42]
to a method in {{BTree}}. Finding last element (using {{findByIndex(tree,
size-1)}}) is also done in a few other places, you can point them to that
method as well.
The ordering in
[{{removeFromLeaf}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R82]
is odd: We first look for the easy case, then switch to the most difficult
one, than go back to an easier one (also repeating a check). Why not check if
left is large enough first, if not check if right is large enough, if not do
the join?
{{removeLeft / removeRight}} are very unreadable. Perhaps extract
{{copyRemoving/copyInserting}} methods and call them?
[The newNextNode size
calculation|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R211]
in {{removeLeft/Right}} looks suspect; it only works because {{FAN_FACTOR}} is
a power of 2. I think it would be much clearer if you enforced the invariant by
using something like {{(nextKeyEnd + 1) | 1}}.
[{{copyBranchWithKeyAndChildRemoved}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R279]
is only used with (and seems to only make sense with) {{keyIndex ==
childIndex}}, you can remove the other parameter.
{{System.arraycopy}} can deal with length 0, remove the unnecessary checks [in
{{copyKeys/Children}}|https://github.com/apache/cassandra/compare/trunk...blambov:haaawk/9991#diff-fd77b44ecefe258afe8334dd2b4b63b7R330].
The test coverage is great, but I'd like to have a more exhaustive random test
as well -- something like removing everything in random order from a large
{{BTree}}. I also noticed {{isWellFormed}} does not check branch node size,
could you add that too?
Going further, the {{BTree}} infrastructure includes methods that deal in
element indices. You can use {{findIndex}} instead of the code in {{remove}}
and then do the removal walk using indices, which will replace a few of the
binary searches in {{removeFromLeaf}} with integer ones. Since you are then
certain that you need to remove an element, it would be easy to integrate the
predecessor swap into the removal pass.
I'm curious if you tried the recursive deletion with bottom-up reorganization
alternative? That looks like it would be simpler if we are okay with a few
extra allocations and copying, which we may be as it shouldn't happen too
often. What do you think?
> Implement efficient btree removal
> ---------------------------------
>
> Key: CASSANDRA-9991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9991
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Benedict
> Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9991.txt
>
>
> Currently removal is implemented as a reconstruction by filtering and
> iterator over the original btree. This could be much more efficient, editing
> just the necessary nodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)