[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351341#comment-15351341
 ] 

Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------

I spent some time reading up {{BirchReader}} to figure out the nuts and bolts 
of how the storage works. I think we can squeeze a little more efficiency into 
the structure:
- As far as I could see, your current implementation places a lot of copies on 
the lower side of each span in the non-leaf nodes (for example, the lowest key 
of the partition is present in the leaf node, its parent as well as all parents 
leading all the way to the root). This should not be necessary, simply omitting 
the first key (but retaining the child pointer) from all intermediate nodes and 
adding 1 to what the binary search returns will achieve the same result.
- I find the overlow flag (and jumping back and forth to read it) less 
efficient than necessary. If we assume instead that key length equal to the max 
always entails overflow data, we would be using less space and be more 
efficient in the common case, while having a very low chance of taking a few 
bytes more in the uncommon situation of long keys.
- Root node could be in the same page with descriptor (it is usually smaller so 
high chance to fit). Perhaps overflow is best placed elsewhere?

More generally (ignoring padding on the leaves which is not necessarily always 
beneficial), the B+ structure you have built is practically a B-Tree index over 
a linear list of index entries. As we already have a linear list of 
{{IndexInfo}} structures in the current format, what are we gaining by not just 
building a B-Tree index over that? To me the latter would appear to be less 
complicated and much more generic with immediate possible applications in other 
parts of the codebase.


> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to