Michael Kjellman commented on CASSANDRA-9754:

In regards to your second point: I'm actually only am using the key cache in 
the current implementation if a) it's a legacy index that hasn't been upgraded 
yet (to keep performance for indexed rows the same during upgrades) b) if the 
row is "non" indexed, or < 64kb so just the starting offset.

For Birch indexed rows they always come from the Birch impl on disk and don't 
get stored in the key cache at all. Ideally I think it would be great if we 
could get rid of the key cache all together! There was some chat about this in 
the ticket earlier...

There is the index summary which has an offset for keys as they are sampled 
during compaction which let you skip to a given starting file offset inside the 
index for a key which reduces the problem you're talking about. I don't think 
the performance of the small-to-medium sized case should be any different with 
the Birch implementation than the current implementation and I'm trying to test 
that with the workload going on for the test_keyspace.largeuuid1 table. The 
issue with the Birch implementation vs the current though is going to be the 
size of the index file on disk due to the segments being aligned on 4kb 
boundaries. I've talked a bunch about this and thrown some ideas around with 
people and I think maybe the best case would be to check if the previously 
added row was a non-indexed segment (so just a long for the start of the 
partition in the index and no tree being built) and then don't align the file 
to a boundary for those cases. The real issue is I don't know the length ahead 
of time so I can't just encode the aligned segments at the end starting at some 
starting offset and encode relative offsets iteratively during compaction. Any 
thoughts on this would be really appreciated though...

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>         Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?

This message was sent by Atlassian JIRA

Reply via email to