[
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235141#comment-15235141
]
Jack Krupansky commented on CASSANDRA-9754:
-------------------------------------------
Any idea how a new wide partition will perform relative to the same amount of
data and same number of clustering rows divided into bucketed partitions? For
example, a single 1 GB wide partition vs. ten 100 MB partitions (same partition
key plus a 0-9 bucket number) vs. a hundred 10 MB partitions (0-99 bucket
number), for two access patterns: 1) random access a row or short slice, and 2)
a full bulk read of the 1 GB of data, one moderate slice at a time.
Or maybe the question is equivalent to asking what the cost is to access the
last row of the 1 GB partition vs. the last row of the tenth or hundredth
bucket of the bucketed equivalent.
No precision required. Just inquiring whether we can get rid of bucketing as a
preferred data modeling strategy, at least for the common use cases where the
sum of the buckets is roughly 2 GB or less..
The bucketing approach does have the side effect of distributing the buckets
around the cluster, which could be a good thing, or maybe not.
> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Michael Kjellman
> Priority: Minor
>
> Looking at a heap dump of 2.0 cluster, I found that majority of the objects
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for
> GC. Can this be improved by not creating so many objects?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)