[
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351341#comment-15351341
]
Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------
I spent some time reading up {{BirchReader}} to figure out the nuts and bolts
of how the storage works. I think we can squeeze a little more efficiency into
the structure:
- As far as I could see, your current implementation places a lot of copies on
the lower side of each span in the non-leaf nodes (for example, the lowest key
of the partition is present in the leaf node, its parent as well as all parents
leading all the way to the root). This should not be necessary, simply omitting
the first key (but retaining the child pointer) from all intermediate nodes and
adding 1 to what the binary search returns will achieve the same result.
- I find the overlow flag (and jumping back and forth to read it) less
efficient than necessary. If we assume instead that key length equal to the max
always entails overflow data, we would be using less space and be more
efficient in the common case, while having a very low chance of taking a few
bytes more in the uncommon situation of long keys.
- Root node could be in the same page with descriptor (it is usually smaller so
high chance to fit). Perhaps overflow is best placed elsewhere?
More generally (ignoring padding on the leaves which is not necessarily always
beneficial), the B+ structure you have built is practically a B-Tree index over
a linear list of index entries. As we already have a linear list of
{{IndexInfo}} structures in the current format, what are we gaining by not just
building a B-Tree index over that? To me the latter would appear to be less
complicated and much more generic with immediate possible applications in other
parts of the codebase.
> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Michael Kjellman
> Priority: Minor
> Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
> Looking at a heap dump of 2.0 cluster, I found that majority of the objects
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for
> GC. Can this be improved by not creating so many objects?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)