[ https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552525#comment-14552525 ]
Aleksey Yeschenko commented on CASSANDRA-9107: ---------------------------------------------- Go ahead. > More accurate row count estimates > --------------------------------- > > Key: CASSANDRA-9107 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9107 > Project: Cassandra > Issue Type: Improvement > Reporter: Chris Lohfink > Assignee: Chris Lohfink > Fix For: 2.1.x > > Attachments: 9107-cassandra2-1.patch, 9107-v2.txt > > > Currently the estimated row count from cfstats is the sum of the number of > rows in all the sstables. This becomes very inaccurate with wide rows or > heavily updated datasets since the same partition would exist in many > sstables. In example: > {code} > create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = > {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, > 'max_threshold': 100} ; > ------------------------------- > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 1 (128 in older version from > index) > insert INTO wide (key, value) VALUES ('key', 'value'); > // flush > // cfstats output: Number of keys (estimate): 2 (256 in older version from > index) > ... etc > {code} > previously it used the index but it still did it per sstable and summed them > up which became inaccurate as there are more sstables (just by much worse). > With new versions of sstables we can merge the cardinalities to resolve this > with a slight hit to accuracy in the case of every sstable having completely > unique partitions. > Furthermore I think it would be pretty minimal effort to include the number > of rows in the memtables to this count. We wont have the cardinality merging > between memtables and sstables but I would consider that a relatively minor > negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)