[
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380306#comment-15380306
]
Michael Kjellman commented on CASSANDRA-9754:
---------------------------------------------
Thanks Branimir for starting to review the code!!
1) Yes, good optimization. The first most entry in the tree (so the first
element in the root node, doesn't need to be repeated in the inner nodes as you
can assume that you always go to the left.
2) I think we can make the assumption that if the length of the bytes for a
given entry's key is greater than or equal to the max length for a given
node/leaf, we can have the code make an assumption that there will be a int
encoded after the key bytes to the offset in the overflow page. If the key
happens to be equal to the max length (but doesn't actually overflow) then we
can encode 0 or something and then have the code know that value means no
overflow page. The downside here is adding more assumptions and complexity to
the code vs eating the single byte in the overflow page.
3) Could do this but I think it increases the complexity as the root node would
need to be special cased for max size.. currnently the root node, inner nodes,
and leaf nodes all use the exact same code, where only the value serializer is
different (to either write the offset to the next node/leaf or the actual value
of the entry you want to add to the tree). We'd need to subtract the side of
the descriptor from the available side of the aligned root node and then know
where to seek inside it.
In regards to your final comment: while technically we could build the
root/inner nodes to point to an offset of indexinfo entries serialized in the
current format, I think the tradeoffs make it non-ideal. If we serialize the
entries in the leaf like I'm currently doing we a) get the benefits of the leaf
being aligned and padded and b) with the proposed leaf format, we can binary
search inside each leaf node to get the exact entry, vs having an inner node
just point to an offset and then needing to linearly scan until we hit a match
I'm almost done with a refactor for trunk. We found some pretty serious
regressions in 2.1 that required my immediate attention but I hope to have a
trunk based patch with your initial suggestions from above incorporated into
the tree implementation very soon.
> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Michael Kjellman
> Priority: Minor
> Fix For: 4.x
>
> Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
> Looking at a heap dump of 2.0 cluster, I found that majority of the objects
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for
> GC. Can this be improved by not creating so many objects?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)