Hi,

I am trying to get some baselines for capacity planning. The approach i
took was to insert increasing number of rows into a replica of the table to
sized,  watch the size of the "data" directory (after doing nodetool flush
and compact), and calculate the average size per row (total directory
size/count of rows). Can this be considered a valid approach to extrapolate
for future growth of data ?

Related to this, is there any information we can gather from partition-size
of cfhistograms (snipped output for my table below) :

Partition Size (bytes)
   642 bytes: 221
   770 bytes: 2328
   924 bytes: 328858
..
8239 bytes: 153178
...
 24601 bytes: 16973
 29521 bytes: 10805
...
219342 bytes: 23
263210 bytes: 6
315852 bytes: 4

It seems the size in cfhisto has a wide variation with the calculated value
using the approach detailed above (avg 2KB/row). Could this difference be
due to compression, or are there any other factors at play here? . What
would be the typical use/interpretation of the "partition size" metric.

The table definition is like :

CREATE TABLE abc (
  key1 text,
  col1 text,
  PRIMARY KEY ((key1))
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'sstable_size_in_mb': '50', 'class':
'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Thanks,
Joseph

Reply via email to