[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190718#comment-15190718
 ] 

Michael Kjellman commented on CASSANDRA-9754:
---------------------------------------------

I have the new FileSegment friendly implementation working for the following 
conditions:

1) straight search for key -> get value
2) iterate efficiently both forwards and reversed thru all elements in the tree
3) binary search for a given key and then iterate thru all remaining keys from 
the found offset
4) overflow page for handling variable length tree elements that exceed the max 
size for a given individual page (up to 2GB)

I also have successfully ran some new unit tests I wrote that now do 5000 
consecutive iterations with randomly generated data (to "fuzz" the tree for 
edge conditions) for building and validating trees that contain between 
300,000-500,000 elements. I've also spent a good amount of time writing some 
pretty reasonable documentation of the binary format itself.

Tomorrow, I'm planning on testing a 4.5GB individual tree against the new 
implementation and doing some profiling to see the exact memory impact now that 
it's basically completed on both the serialization and deserialization paths. 
Will update with those findings tomorrow!

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to