[ 
https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552525#comment-14552525
 ] 

Aleksey Yeschenko commented on CASSANDRA-9107:
----------------------------------------------

Go ahead.

> More accurate row count estimates
> ---------------------------------
>
>                 Key: CASSANDRA-9107
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9107
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Assignee: Chris Lohfink
>             Fix For: 2.1.x
>
>         Attachments: 9107-cassandra2-1.patch, 9107-v2.txt
>
>
> Currently the estimated row count from cfstats is the sum of the number of 
> rows in all the sstables. This becomes very inaccurate with wide rows or 
> heavily updated datasets since the same partition would exist in many 
> sstables.  In example:
> {code}
> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = 
> {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, 
> 'max_threshold': 100} ;
> -------------------------------
> insert INTO wide (key, value) VALUES ('key', 'value');
> // flush
> // cfstats output: Number of keys (estimate): 1  (128 in older version from 
> index)
> insert INTO wide (key, value) VALUES ('key', 'value');
> // flush
> // cfstats output: Number of keys (estimate): 2  (256 in older version from 
> index)
> ... etc
> {code}
> previously it used the index but it still did it per sstable and summed them 
> up which became inaccurate as there are more sstables (just by much worse). 
> With new versions of sstables we can merge the cardinalities to resolve this 
> with a slight hit to accuracy in the case of every sstable having completely 
> unique partitions.
> Furthermore I think it would be pretty minimal effort to include the number 
> of rows in the memtables to this count. We wont have the cardinality merging 
> between memtables and sstables but I would consider that a relatively minor 
> negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to