Hi, I am trying to get some baselines for capacity planning. The approach i took was to insert increasing number of rows into a replica of the table to sized, watch the size of the "data" directory (after doing nodetool flush and compact), and calculate the average size per row (total directory size/count of rows). Can this be considered a valid approach to extrapolate for future growth of data ?
Related to this, is there any information we can gather from partition-size of cfhistograms (snipped output for my table below) : Partition Size (bytes) 642 bytes: 221 770 bytes: 2328 924 bytes: 328858 .. 8239 bytes: 153178 ... 24601 bytes: 16973 29521 bytes: 10805 ... 219342 bytes: 23 263210 bytes: 6 315852 bytes: 4 It seems the size in cfhisto has a wide variation with the calculated value using the approach detailed above (avg 2KB/row). Could this difference be due to compression, or are there any other factors at play here? . What would be the typical use/interpretation of the "partition size" metric. The table definition is like : CREATE TABLE abc ( key1 text, col1 text, PRIMARY KEY ((key1)) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.000000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'sstable_size_in_mb': '50', 'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; Thanks, Joseph