[ 
https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860614#comment-13860614
 ] 

Benedict commented on CASSANDRA-6271:
-------------------------------------

It sort of does, it just does it all in a batch and does it to a new copy.

1. It finds the appropriate leaf (but scanning from the previous insertion 
point, instead of the root of the tree)
1a. It copies any elements between the previous insertion point and the new 
insertion point into the new tree (necessary because we're immutable, but does 
not affect semantics)
2. Puts it in the leaf if there is room (if not splits), but since it can be 
inserted ahead of some pre-existing elements being copied to the new version, 
the split may be delayed until *they* are copied forwards in the next (1a) 
step. These later copies can also be thought of as inserts.
3. Splits the parent if necessary  (in the addChild method, which is called 
whenever we have a completed node to pass to the parent)

It can also, perhaps more easily, be thought of as performing an optimised 
insert of the union of all elements in the original tree with the updating 
collection. Instead of going through each item in the original tree, when a 
sub-tree range doesn't intersect with the updating collection it doesn't 
descend into the tree, but copies it as is (semantically equivalent to 
inserting them all). 

If they do intersect, it descends and performs the optimised insert on the 
sub-tree. The optimised insert on a leaf is simply a binary search for the 
insertion point, with an array copy to move any leaves between the last 
insertion point and the new insertion point. If our next insert takes as back 
up out of a sub-tree/leaf, we just finish inserting all of the remaining 
elements from the original tree, spilling up when necessary as with any insert.

If at any point any of the inserts causes an overflow we "split" by creating a 
node with half of all elements we've buffered to a new node and adding it 
immediately to the parent. This can happen an arbitrary number of times for a 
sub-tree, since we don't bound the size of the updating collection, a huge 
portion of which which could intersect with a given sub-tree.


> Replace SnapTree in AtomicSortedColumns
> ---------------------------------------
>
>                 Key: CASSANDRA-6271
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6271
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>         Attachments: oprate.svg
>
>
> On the write path a huge percentage of time is spent in GC (>50% in my tests, 
> if accounting for slow down due to parallel marking). SnapTrees are both GC 
> unfriendly due to their structure and also very expensive to keep around - 
> each column name in AtomicSortedColumns uses > 100 bytes on average 
> (excluding the actual ByteBuffer).
> I suggest using a sorted array; changes are supplied at-once, as opposed to 
> one at a time, and if < 10% of the keys in the array change (and data equal 
> to < 10% of the size of the key array) we simply overlay a new array of 
> changes only over the top. Otherwise we rewrite the array. This method should 
> ensure much less GC overhead, and also save approximately 80% of the current 
> memory overhead.
> TreeMap is similarly difficult object for the GC, and a related task might be 
> to remove it where not strictly necessary, even though we don't keep them 
> hanging around for long. TreeMapBackedSortedColumns, for instance, seems to 
> be used in a lot of places where we could simply sort the columns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to