You cannot busy throw bigger and bigger disks at a cluster that is
accumulating data as it fills up.  This is due to data grooming tasks that
increase in cost as your data density per node increases (for example,
compaction), as well as other factors that are impacted by data density
(such as caches).

As a rule of thumb you probably should not have more than about 1TB of data
per node on magnetic media, and not more than about 2TB of data per node on
SSD media.  Non-local filesystems like EBS, EFS, and S3 would change that
number downward.  And of course this will also be workload and compaction
strategy dependent.  You might do fine with higher density, you might tip
over before these densities.

At some point you need to consider expanding your cluster by adding new
nodes.  That point might be now, and/or it might be time to consider other
instance types.

If you're growing your cluster due to disk capacity on the M3.2XL it
suggests a different instance type with better storage density.  You might
not even need to use SSD's at your write rate.  You might be able to build
up a larger cluster of "smaller" instances (with higher data density) of
magnetic media... just keep in mind not to be tempted to try to fill up the
really big storage types like D2's or HS1, you'll fail for other reasons
before those disks are nearing anything close to capacity.

Finally, in general, unless you are an experienced operator, I wouldn't
recommend trying to use non-local storage types like EBS, EFS, and S3.
They are possible, I know operators doing some pretty awesome things with
them.  But they only fit certain use cases, and have failure modes within
Cassandra which are not immediately obvious.


On Sun, Oct 11, 2015, 11:51 PM srungarapu vamsi <srungarapu1...@gmail.com>
wrote:

> Hi i have a 3 node cassandra cluster in aws. Each node is m3 large with
> 160GB hard disk.
> It has been 1 month and cassandra already occupied 51GB of my disk space.
>
> Obviously at some point of time i will run out of the disk space as the
> data keeps coming in.
> So, i would like to go in the path of mounting a volume(say ebs) on a node
> and pointing data directory to that mount point.
> But even in this approach as the ebs volume has some size 'X', at some
> point of time i will have to face the same problem of running out of disk
> space.
>
> In my view solution for this problem can be either of the following:
>
>    1. Mount a efs(Elastic file system) to cassandra node and point it to
>    cassandra data directory
>    2. Mount S3 on cassandra ec2 instance and point cassandra data
>    directory to this s3 mount point.
>    3. Create an ebs volume of very huge size such that we never reach
>    this size limitation
>
> I would like to get review on the possible solutions i thought of !.
> Please suggest how you guys are solving the problem of running out of disk
> space in production.
>
> --
> /Vamsi
>

Reply via email to