Hello Kunal,

I would take a look at the following configuration options in the Cassandra.yaml

Common automatic backup settings
Incremental_backups:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__incremental_backups

(Default: false) Backs up data updated since the last snapshot was taken. When 
enabled, Cassandra creates a hard link to each SSTable flushed or streamed 
locally in a backups subdirectory of the keyspace data. Removing these links is 
the operator's responsibility.

snapshot_before_compaction:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__snapshot_before_compaction

(Default: false) Enables or disables taking a snapshot before each compaction. 
A snapshot is useful to back up data when there is a data format change. Be 
careful using this option: Cassandra does not clean up older snapshots 
automatically.


Advanced automatic backup setting
auto_snapshot:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__auto_snapshot

(Default: true) Enables or disables whether Cassandra takes a snapshot of the 
data before truncating a keyspace or dropping a table. To prevent data loss, 
Datastax strongly advises using the default setting. If you set auto_snapshot 
to false, you lose data on truncation or drop.


nodetool also provides methods to manage snapshots. 
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsNodetool.html
See the specific commands:

  *   nodetool 
clearsnapshot<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html>
Removes one or more snapshots.
  *   nodetool 
listsnapshots<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html>
Lists snapshot names, size on disk, and true size.
  *   nodetool 
snapshot<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html>
Take a snapshot of one or more keyspaces, or of a table, to backup data.

As far as I am aware, using rm is perfectly safe to delete the directories for 
snapshots/backups as long as you are careful not to delete your actively used 
sstable files and directories.  I think the nodetool clearsnapshot command is 
provided so that you don’t accidentally delete actively used files.  Last I 
used clearsnapshot, (a very long time ago), I thought it left behind the 
directory, but this could have been fixed in newer versions (so you might want 
to check that).

HTH
-Razi


From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, January 10, 2017 at 12:26 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Backups eating up disk space

If you remove the files from the backup directory, you would not have data loss 
in the case of a node going down.  They're hard links to the same files that 
are in your data directory, and are created when an sstable is written to disk. 
 At the time, they take up (almost) no space, so they aren't a big deal, but 
when the sstable gets compacted, they stick around, so they end up not freeing 
space up.

Usually you use incremental backups as a means of moving the sstables off the 
node to a backup location.  If you're not doing anything with them, they're 
just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use the 
tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar 
<kgangakhed...@gmail.com<mailto:kgangakhed...@gmail.com>> wrote:
Thanks for quick reply, Jon.

But, what about in case of node/cluster going down? Would there be data loss if 
I remove these files manually?

How is it typically managed in production setups?
What are the best-practices for the same?
Do people take snapshots on each node before removing the backups?

This is my first production deployment - so, still trying to learn.

Thanks,
Kunal

On 10 January 2017 at 21:36, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
You can just delete them off the filesystem (rm)

On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar 
<kgangakhed...@gmail.com<mailto:kgangakhed...@gmail.com>> wrote:
Hi all,

We have a 3-node cassandra cluster with incremental backup set to true.
Each node has 1TB data volume that stores cassandra data.

The load in the output of 'nodetool status' comes up at around 260GB each node.
All our keyspaces use replication factor = 3.

However, the df output shows the data volumes consuming around 850GB of space.
I checked the keyspace directory structures - most of the space goes in 
<CASS_DATA_VOL>/data/<KEYSPACE>/<CF>/backups.

We have never manually run snapshots.

What is the typical procedure to clear the backups?
Can it be done without taking the node offline?

Thanks,
Kunal

Reply via email to