Hello Everybody,
I am trying to understand how disks are filled in cassandra.
If we run cassandra in a cluster of N (commodity) servers and each
server has the same DataFileDirectories configuration
<DataFileDirectories>
<DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
</DataFileDirectories>
and with the same disk size, what should I do to never get a disk full
on those servers?
Does cassandra scaling only me to act like follows:
I just watch the percentage of use of the partition containing the
/var/lib/cassandra/data on each server and if one of the servers returns
a usage greater than a threshold (say 95%), then I just have to add an
extra N+1 node to my cluster?
Will the disk usage eventually stablize a the average disk usage 'U' it
was before the node addition to the lower value 'U * (N/N+1)'?
Is that that easy? ;)
Thanks
Alex