[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380306#comment-15380306
 ] 

Michael Kjellman commented on CASSANDRA-9754:
---------------------------------------------

Thanks Branimir for starting to review the code!!

1) Yes, good optimization. The first most entry in the tree (so the first 
element in the root node, doesn't need to be repeated in the inner nodes as you 
can assume that you always go to the left.
2) I think we can make the assumption that if the length of the bytes for a 
given entry's key is greater than or equal to the max length for a given 
node/leaf, we can have the code make an assumption that there will be a int 
encoded after the key bytes to the offset in the overflow page. If the key 
happens to be equal to the max length (but doesn't actually overflow) then we 
can encode 0 or something and then have the code know that value means no 
overflow page. The downside here is adding more assumptions and complexity to 
the code vs eating the single byte in the overflow page.
3) Could do this but I think it increases the complexity as the root node would 
need to be special cased for max size.. currnently the root node, inner nodes, 
and leaf nodes all use the exact same code, where only the value serializer is 
different (to either write the offset to the next node/leaf or the actual value 
of the entry you want to add to the tree). We'd need to subtract the side of 
the descriptor from the available side of the aligned root node and then know 
where to seek inside it.

In regards to your final comment: while technically we could build the 
root/inner nodes to point to an offset of indexinfo entries serialized in the 
current format, I think the tradeoffs make it non-ideal. If we serialize the 
entries in the leaf like I'm currently doing we a) get the benefits of the leaf 
being aligned and padded and b) with the proposed leaf format, we can binary 
search inside each leaf node to get the exact entry, vs having an inner node 
just point to an offset and then needing to linearly scan until we hit a match 

I'm almost done with a refactor for trunk. We found some pretty serious 
regressions in 2.1 that required my immediate attention but I hope to have a 
trunk based patch with your initial suggestions from above incorporated into 
the tree implementation very soon.


> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to