[
https://issues.apache.org/jira/browse/CASSANDRA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859941#comment-13859941
]
Benedict commented on CASSANDRA-6271:
-------------------------------------
bq. Why would we skip over all the potential intermediate keys in the root node?
This block is reached once we've finished iterating through a complete leaf,
since if we start the method call in a branch (starting meaning we were
visiting a key stored in a branch) we must be able to go down and see more
elements. So on the first iteration it can only succeed if the root is a leaf
and we're at the end, *otherwise* the code just after this check will run on
each parent branch before the isRoot() check, so we'll visit the root correctly
still.
To be honest, looking at it now after the refactoring (before which I was using
this loop structure to save on some memory references), this should probably
now be just a while (!isRoot()), which would make this a lot clearer.
> Replace SnapTree in AtomicSortedColumns
> ---------------------------------------
>
> Key: CASSANDRA-6271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6271
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Attachments: oprate.svg
>
>
> On the write path a huge percentage of time is spent in GC (>50% in my tests,
> if accounting for slow down due to parallel marking). SnapTrees are both GC
> unfriendly due to their structure and also very expensive to keep around -
> each column name in AtomicSortedColumns uses > 100 bytes on average
> (excluding the actual ByteBuffer).
> I suggest using a sorted array; changes are supplied at-once, as opposed to
> one at a time, and if < 10% of the keys in the array change (and data equal
> to < 10% of the size of the key array) we simply overlay a new array of
> changes only over the top. Otherwise we rewrite the array. This method should
> ensure much less GC overhead, and also save approximately 80% of the current
> memory overhead.
> TreeMap is similarly difficult object for the GC, and a related task might be
> to remove it where not strictly necessary, even though we don't keep them
> hanging around for long. TreeMapBackedSortedColumns, for instance, seems to
> be used in a lot of places where we could simply sort the columns.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)