Re: Cassandra backup to alternate location

2018-06-28 Thread Jeff Jirsa
No - they'll hardlink into the snapshot folder on each data directory. They
are true hardlinks, so even if you could move it, it'd still be on the same
filesystem.

Typical behavior is to issue a snapshot, and then copy the data out as
needed (using something like https://github.com/JeremyGrosser/tablesnap ).

On Thu, Jun 28, 2018 at 10:00 AM, Lohchab, Sanjeev  wrote:

> Hi All,
>
>
>
> I am trying to backup Cassandra DB, but by default it is saving the
> snapshots in the default location.
>
> Is there any way we can specific the location where we want to store the
> snapshots.
>
>
>
> Regards
>
> Sanjeev
>


Re: Cassandra backup via snapshots in production

2014-12-01 Thread Robert Coli
On Thu, Nov 27, 2014 at 2:34 AM, Jens Rantil jens.ran...@tink.se wrote:

 Late answer; You can find my backup script here:
 https://gist.github.com/JensRantil/a8150e998250edfcd1a3


Why not use the much more robustly designed and maintained community based
project, tablesnap?

https://github.com/JeremyGrosser/tablesnap

=Rob


Re: Cassandra backup via snapshots in production

2014-12-01 Thread Jens Rantil
On Mon, Dec 1, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote:

 Why not use the much more robustly designed and maintained community based
 project, tablesnap?


For two reasons:

   - Because I am tired of the deployment model of Python apps which
   require me to set up virtual environments.
   - Because it did, AFAIK, not support (asymmetric) encryption before
   uploading.

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


RE: Cassandra backup via snapshots in production

2014-11-27 Thread Ngoc Minh VO
Thanks a lot for your answers!

What we plan to do is:

-  auto_snapshot = true

-  if the human errors happened on D-5:

o   we will bring the cluster offline

o   purge all data

o   import snapshots prior D-5 (and delete snapshots after D-5)

o   upload all missing data between D-5 and D

o   bring the cluster online

Do you think it would work?

From: Jens Rantil [mailto:jens.ran...@tink.se]
Sent: mardi 25 novembre 2014 10:03
To: user@cassandra.apache.org
Subject: Re: Cassandra backup via snapshots in production

 Truncate does trigger snapshot creation though

Doesn’t it? With “auto_snapshot: true” it should.

——— Jens Rantil Backend engineer Tink AB Email: 
jens.ran...@tink.semailto:jens.ran...@tink.se Phone: +46 708 84 18 32 Web: 
www.tink.sehttp://www.tink.se Facebook Linkedin Twitter


On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:

True

Delete in CQL just create tombstone so from the storage engine pov it's just 
adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com a écrit :
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil 
jens.ran...@tink.semailto:jens.ran...@tink.se wrote:
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).

If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.

OP includes delete in their list of unexpected manipulations, and 
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba



This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


Re: Cassandra backup via snapshots in production

2014-11-27 Thread Jens Rantil
Late answer; You can find my backup script here: 
https://gist.github.com/JensRantil/a8150e998250edfcd1a3


Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using 
s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to 
backup it to S3. We run the script is run periodically on every node.




Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to 
all nodes.




Like I said, there’s lots of love that can be put into a backup system. Note 
that the script has the following limitations:

 * It does not checksum the files. However s3cmd website states that it by 
default compares MD5 and file size on upload.

 * It does not do purging of files on S3 (which you could configure using 
“Object Lifecycles”).

 * It does not warn you that a backup fails. Check your logs periodically.

 * It does not do any advanced logging. Make sure to pipe the output to a file 
or the `syslog` utility.

 * It does not do continuous/point-in-time backup.




That said, it does its job for us for now.




Feel free to propose improvements!




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh w...@cs.umd.edu wrote:

 Jens,
 I'd be interested in seeing your script. We've been thinking of doing exactly 
 that but uploading to Glacier instead.
 Thanks, Bill
 On Nov 21, 2014, at 11:40 AM, Jens Rantil jens.ran...@tink.se wrote:
 
  The main purpose is to protect us from human errors (eg. unexpected 
  manipulations: delete, drop tables, …).
 
 If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
 will be enough to protect you.
 
 Regarding backup, I have a small script that creates a named snapshot and 
 for each sstable; encrypts, uploads to S3 and deletes the snapshotted 
 sstable. It took me an hour to write and roll out to all our nodes. The 
 whole process is currently logged, but eventually I will also send an e-mail 
 if backup fails.
 
 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: 
 +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
 
 
 On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com 
 wrote:
 
 Hello all,
 
 
 
 
  
 
 We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
 
 
 
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).
 
 
 
 
  
 
 We are thinking of:
 
 
 
 -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
 
 
 
 -  Restore: load the most recent snapshots or latest “non-corrupted” 
 ones and replay missing data imports from other data source.
 
 
 
 
  
 
 We would like to know if somebody are using Cassandra’s backup feature in 
 production and could share your experience with us.
 
 
 
 
  
 
 Your help would be greatly appreciated.
 
 
 
 Best regards,
 
 
 
 Minh
 
 
 
 
 This message and any attachments (the message) is
 intended solely for the intended addressees and is confidential. 
 If you receive this message in error,or are not the intended recipient(s), 
 please delete it and any copies from your systems and immediately notify
 the sender. Any unauthorized view, use that does not comply with its 
 purpose, 
 dissemination or disclosure, either whole or partial, is prohibited. Since 
 the internet 
 cannot guarantee the integrity of this message which may not be reliable, 
 BNP PARIBAS 
 (and its subsidiaries) shall not be liable for the message if modified, 
 changed or falsified. 
 Do not print this message unless it is necessary,consider the environment.
 
 --
 
 Ce message et toutes les pieces jointes (ci-apres le message) 
 sont etablis a l'intention exclusive de ses destinataires et sont 
 confidentiels.
 Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
 merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
 immediatement l'expediteur. Toute lecture non autorisee, toute utilisation 
 de 
 ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
 publication, totale ou partielle, est interdite. L'Internet ne permettant 
 pas d'assurer
 l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
 (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
 dans l'hypothese
 ou il aurait ete modifie, deforme ou falsifie. 
 N'imprimez ce message que si necessaire, pensez a l'environnement.
 
 

Re: Cassandra backup via snapshots in production

2014-11-25 Thread DuyHai Doan
True

Delete in CQL just create tombstone so from the storage engine pov it's
just adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit :

 On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote:

  The main purpose is to protect us from human errors (eg. unexpected
 manipulations: delete, drop tables, …).

 If that is the main purpose, having auto_snapshot: true” in
 cassandra.yaml will be enough to protect you.


 OP includes delete in their list of unexpected manipulations, and
 auto_snapshot: true will not protect you in any way from DELETE.

 =Rob
 http://twitter.com/rcolidba



Re: Cassandra backup via snapshots in production

2014-11-25 Thread Jens Rantil
 Truncate does trigger snapshot creation though




Doesn’t it? With “auto_snapshot: true” it should.




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.com wrote:

 True
 Delete in CQL just create tombstone so from the storage engine pov it's
 just adding some physical columns
 Truncate does trigger snapshot creation though
 Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit :
 On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote:

  The main purpose is to protect us from human errors (eg. unexpected
 manipulations: delete, drop tables, …).

 If that is the main purpose, having auto_snapshot: true” in
 cassandra.yaml will be enough to protect you.


 OP includes delete in their list of unexpected manipulations, and
 auto_snapshot: true will not protect you in any way from DELETE.

 =Rob
 http://twitter.com/rcolidba


Re: Cassandra backup via snapshots in production

2014-11-21 Thread Jens Rantil
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).




If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.




Regarding backup, I have a small script that creates a named snapshot and for 
each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It 
took me an hour to write and roll out to all our nodes. The whole process is 
currently logged, but eventually I will also send an e-mail if backup fails.


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com
wrote:

 Hello all,
 We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).
 We are thinking of:
 -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
 -  Restore: load the most recent snapshots or latest “non-corrupted” 
 ones and replay missing data imports from other data source.
 We would like to know if somebody are using Cassandra’s backup feature in 
 production and could share your experience with us.
 Your help would be greatly appreciated.
 Best regards,
 Minh
 This message and any attachments (the message) is
 intended solely for the intended addressees and is confidential. 
 If you receive this message in error,or are not the intended recipient(s), 
 please delete it and any copies from your systems and immediately notify
 the sender. Any unauthorized view, use that does not comply with its purpose, 
 dissemination or disclosure, either whole or partial, is prohibited. Since 
 the internet 
 cannot guarantee the integrity of this message which may not be reliable, BNP 
 PARIBAS 
 (and its subsidiaries) shall not be liable for the message if modified, 
 changed or falsified. 
 Do not print this message unless it is necessary,consider the environment.
 --
 Ce message et toutes les pieces jointes (ci-apres le message) 
 sont etablis a l'intention exclusive de ses destinataires et sont 
 confidentiels.
 Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
 merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
 immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
 ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
 publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
 d'assurer
 l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
 (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
 dans l'hypothese
 ou il aurait ete modifie, deforme ou falsifie. 
 N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Cassandra backup via snapshots in production

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote:

  The main purpose is to protect us from human errors (eg. unexpected
 manipulations: delete, drop tables, …).

 If that is the main purpose, having auto_snapshot: true” in
 cassandra.yaml will be enough to protect you.


OP includes delete in their list of unexpected manipulations, and
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba


Re: Cassandra backup via snapshots in production

2014-11-19 Thread Robert Coli
On Tue, Nov 18, 2014 at 6:50 AM, Ngoc Minh VO ngocminh...@bnpparibas.com
wrote:

   We are looking for a solution to backup data in our C* cluster (v2.0.x,
 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).

 The main purpose is to protect us from human errors (eg. unexpected
 manipulations: delete, drop tables, …).


https://github.com/JeremyGrosser/tablesnap

=Rob


Re: cassandra backup

2013-12-06 Thread Michael Theroux
Hi Marcelo,

Cassandra provides and eventually consistent model for backups.  You can do 
staggered backups of data, with the idea that if you restore a node, and then 
do a repair, your data will be once again consistent.  Cassandra will not 
automatically copy the data to other nodes (other than via hinted handoff).  
You should manually run repair after restoring a node.
  
You should take snapshots when doing a backup, as it keeps the data you are 
backing up relevant to a single point in time, otherwise compaction could 
add/delete files one you mid-backup, or worse, I imagine attempt to access a 
SSTable mid-write.  Snapshots work by using links, and don't take additional 
storage to perform.  In our process we create the snapshot, perform the backup, 
and then clear the snapshot.

One thing to keep in mind in your S3 cost analysis is that, even though storage 
is cheap, reads/writes to S3 are not (especially writes).  If you are using 
LeveledCompaction, or otherwise have a ton of SSTables, some people have 
encountered increased costs moving the data to S3.

Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync data 
too.  Thus far this has worked very well for us.

-Mike



On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:
 
Hello everyone,

    I am trying to create backups of my data on AWS. My goal is to store the 
backups on S3 or glacier, as it's cheap to store this kind of data. So, if I 
have a cluster with N nodes, I would like to copy data from all N nodes to S3 
and be able to restore later. I know Priam does that (we were using it), but I 
am using the latest cassandra version and we plan to use DSE some time, I am 
not sure Priam fits this case.
    I took a look at the docs: 
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html
 
    And I am trying to understand if it's really needed to take a snapshot to 
create my backup. Suppose I do a flush and copy the sstables from each node, 1 
by one, to s3. Not all at the same time, but one by one. 
    When I try to restore my backup, data from node 1 will be older than data 
from node 2. Will this cause problems? AFAIK, if I am using a replication 
factor of 2, for instance, and Cassandra sees data from node X only, it will 
automatically copy it to other nodes, right? Is there any chance of cassandra 
nodes become corrupt somehow if I do my backups this way?

Best regards,
Marcelo Valle.

Re: cassandra backup

2013-12-06 Thread Rahul Menon
You should look at this - https://github.com/amorton/cassback i dont
believe its setup to use 1.2.10 and above but i believe is just small
tweeks to get it running.

Thanks
Rahul


On Fri, Dec 6, 2013 at 7:09 PM, Michael Theroux mthero...@yahoo.com wrote:

 Hi Marcelo,

 Cassandra provides and eventually consistent model for backups.  You can
 do staggered backups of data, with the idea that if you restore a node, and
 then do a repair, your data will be once again consistent.  Cassandra will
 not automatically copy the data to other nodes (other than via hinted
 handoff).  You should manually run repair after restoring a node.

 You should take snapshots when doing a backup, as it keeps the data you
 are backing up relevant to a single point in time, otherwise compaction
 could add/delete files one you mid-backup, or worse, I imagine attempt to
 access a SSTable mid-write.  Snapshots work by using links, and don't take
 additional storage to perform.  In our process we create the snapshot,
 perform the backup, and then clear the snapshot.

 One thing to keep in mind in your S3 cost analysis is that, even though
 storage is cheap, reads/writes to S3 are not (especially writes).  If you
 are using LeveledCompaction, or otherwise have a ton of SSTables, some
 people have encountered increased costs moving the data to S3.

 Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync
 data too.  Thus far this has worked very well for us.

 -Mike


   On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:
   Hello everyone,

 I am trying to create backups of my data on AWS. My goal is to store
 the backups on S3 or glacier, as it's cheap to store this kind of data. So,
 if I have a cluster with N nodes, I would like to copy data from all N
 nodes to S3 and be able to restore later. I know Priam does that (we were
 using it), but I am using the latest cassandra version and we plan to use
 DSE some time, I am not sure Priam fits this case.
 I took a look at the docs:
 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html

 And I am trying to understand if it's really needed to take a snapshot
 to create my backup. Suppose I do a flush and copy the sstables from each
 node, 1 by one, to s3. Not all at the same time, but one by one.
 When I try to restore my backup, data from node 1 will be older than
 data from node 2. Will this cause problems? AFAIK, if I am using a
 replication factor of 2, for instance, and Cassandra sees data from node X
 only, it will automatically copy it to other nodes, right? Is there any
 chance of cassandra nodes become corrupt somehow if I do my backups this
 way?

 Best regards,
 Marcelo Valle.





Re: cassandra backup

2013-12-06 Thread Jonathan Haddad
I believe SSTables are written to a temporary file then moved.  If I
remember correctly, tools like tablesnap listen for the inotify event
IN_MOVED_TO.  This should handle the try to back up sstable while in
mid-write issue.


On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux mthero...@yahoo.com wrote:

 Hi Marcelo,

 Cassandra provides and eventually consistent model for backups.  You can
 do staggered backups of data, with the idea that if you restore a node, and
 then do a repair, your data will be once again consistent.  Cassandra will
 not automatically copy the data to other nodes (other than via hinted
 handoff).  You should manually run repair after restoring a node.

 You should take snapshots when doing a backup, as it keeps the data you
 are backing up relevant to a single point in time, otherwise compaction
 could add/delete files one you mid-backup, or worse, I imagine attempt to
 access a SSTable mid-write.  Snapshots work by using links, and don't take
 additional storage to perform.  In our process we create the snapshot,
 perform the backup, and then clear the snapshot.

 One thing to keep in mind in your S3 cost analysis is that, even though
 storage is cheap, reads/writes to S3 are not (especially writes).  If you
 are using LeveledCompaction, or otherwise have a ton of SSTables, some
 people have encountered increased costs moving the data to S3.

 Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync
 data too.  Thus far this has worked very well for us.

 -Mike


   On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:
  Hello everyone,

 I am trying to create backups of my data on AWS. My goal is to store
 the backups on S3 or glacier, as it's cheap to store this kind of data. So,
 if I have a cluster with N nodes, I would like to copy data from all N
 nodes to S3 and be able to restore later. I know Priam does that (we were
 using it), but I am using the latest cassandra version and we plan to use
 DSE some time, I am not sure Priam fits this case.
 I took a look at the docs:
 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html

 And I am trying to understand if it's really needed to take a snapshot
 to create my backup. Suppose I do a flush and copy the sstables from each
 node, 1 by one, to s3. Not all at the same time, but one by one.
 When I try to restore my backup, data from node 1 will be older than
 data from node 2. Will this cause problems? AFAIK, if I am using a
 replication factor of 2, for instance, and Cassandra sees data from node X
 only, it will automatically copy it to other nodes, right? Is there any
 chance of cassandra nodes become corrupt somehow if I do my backups this
 way?

 Best regards,
 Marcelo Valle.





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: cassandra backup

2013-12-06 Thread Robert Coli
On Fri, Dec 6, 2013 at 5:13 AM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 I am trying to create backups of my data on AWS. My goal is to store
 the backups on S3 or glacier, as it's cheap to store this kind of data. So,
 if I have a cluster with N nodes, I would like to copy data from all N
 nodes to S3 and be able to restore later.


https://github.com/synack/tablesnap

Automated backup, restore, purging, intended for use with Cassandra.

=Rob


Re: Cassandra backup

2013-02-18 Thread Michael Kjellman
There is this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement

But you'll need to design your data model around the fact that this is only as 
granular as 1 column family

Best,
michael

From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, February 18, 2013 6:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Cassandra backup

Hi – We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate ‘archive’ cluster 
to serve the old data and a ‘active’ cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


RE: Cassandra backup

2013-02-18 Thread Kanwar Sangha
Thanks. I will look into the details.

One issue I see is that if I have only one column family which needs only the 
last 7 days data to be on SSD and the rest to be on the HDD, how will that work.

From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: 18 February 2013 20:08
To: user@cassandra.apache.org
Subject: Re: Cassandra backup

There is this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement

But you'll need to design your data model around the fact that this is only as 
granular as 1 column family

Best,
michael

From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, February 18, 2013 6:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Cassandra backup

Hi - We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster 
to serve the old data and a 'active' cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


Re: Cassandra backup question regarding commitlogs

2012-05-11 Thread Vijay
The incremental backups are generated when the flush is complete (Only
during the flush), If the node crash before the flush completes then the
commit logs in the local node's backup for the data in memory.
It wouldn't help to copy the Commit log across because they are not
immutable (They are recycled).

There is commit log backup in 1.1.1 (Yet to be released)
https://issues.apache.org/jira/browse/CASSANDRA-3690

Thanks,
/VJ



On Sun, Apr 29, 2012 at 3:29 PM, Roshan codeva...@gmail.com wrote:

 Hi

 Currently I am taking daily snapshot on my keyspace in production and
 already enable the incremental backups as well.

 According to the documentation, the incremental backup option will create
 an
 hard-link to the backup folder when new sstable is flushed. Snapshot will
 copy all the data/index/etc. files to a new folder.

 Question:
 What will happen (with enabling the incremental backup) when crash (due to
 any reason) the Cassandra before flushing the data as a SSTable (inserted
 data still in commitlog). In this case how can I backup/restore data?

 Do I need to backup the commitlogs as well and and replay during the server
 start to restore the data in commitlog files?

 Thanks.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-question-regarding-commitlogs-tp7511918.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Cassandra backup queston regarding commitlogs

2012-05-01 Thread aaron morton
If you delete the commit logs you are rolling back to exactly what was in the 
snapshot. When you take a snapshot it flushes the memtables first, so there is 
nothing in the commit log that is not in the snapshot. Rolling back to a 
snapshot is rollback to that point in time. 

If you want to restore to any point in time you need snapshots + incremental 
snapshot + commit log (for things that have not made it to sstables). Otherwise 
there is a potential loss of data that has not been flushed to disk. This is 
different to what the DS docs are talking about. I'm not sure why they are 
saying delete the commit log, try asking on their forum 
http://www.datastax.com/support-forums/

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/05/2012, at 12:02 PM, Roshan wrote:

 Any help regarding this is appreciated.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7518544.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Cassandra backup queston regarding commitlogs

2012-05-01 Thread Roshan
Many thanks Aaron. I will post a support issue for them. But will keep the
snapshot + incremental backups + commitlogs to recover any failure
situation.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7518866.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra backup queston regarding commitlogs

2012-04-30 Thread Roshan
Hi Aaron

Thanks for the comments. Yes for the durability will keep them in a safe
place. But such crash situation, how can I restore the data (because those
are not in a SSTable and only in commit log). 

Do I need to replay only that commit log when server starts after crash?
Will it override the same keys with values?

Appreciate your reply on this.

Kind Regards

/Roshan

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/deleted-tp7508823p7512499.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra backup queston regarding commitlogs

2012-04-30 Thread aaron morton
When the server starts it reads the SSTables then applies the Commit Logs. 

There is nothing you need to do other than leave the commit logs where they 
are. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/04/2012, at 6:02 PM, Roshan wrote:

 Hi Aaron
 
 Thanks for the comments. Yes for the durability will keep them in a safe
 place. But such crash situation, how can I restore the data (because those
 are not in a SSTable and only in commit log). 
 
 Do I need to replay only that commit log when server starts after crash?
 Will it override the same keys with values?
 
 Appreciate your reply on this.
 
 Kind Regards
 
 /Roshan
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/deleted-tp7508823p7512499.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Cassandra backup queston regarding commitlogs

2012-04-30 Thread Roshan
Many Thanks Aaron. 

According to the datastax restore documentation, they ask to remove the
commitlogs before restoring (Clear all files the
/var/lib/cassandra/commitlog (by default)). 

In that case better not to follow this step in a server rash situation.

Thanks

/Roshan 

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7515217.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra backup queston regarding commitlogs

2012-04-30 Thread aaron morton
Can you provide a link to that page ?

Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/05/2012, at 10:12 AM, Roshan wrote:

 Many Thanks Aaron. 
 
 According to the datastax restore documentation, they ask to remove the
 commitlogs before restoring (Clear all files the
 /var/lib/cassandra/commitlog (by default)). 
 
 In that case better not to follow this step in a server rash situation.
 
 Thanks
 
 /Roshan 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7515217.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Cassandra backup queston regarding commitlogs

2012-04-29 Thread Tamar Fraenkel
I want to add a couple of questions regrading incremental backups:
1. If I already have a Cassandra cluster running, would changing the  i
ncremental_backups parameter in the cassandra.yaml of each node, and then
restart it do the trick?
2. Assuming I am creating a daily snapshot, what is the gain from setting
incremental backup to true?

Thanks,
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Sat, Apr 28, 2012 at 4:04 PM, Roshan codeva...@gmail.com wrote:

 Hi

 Currently I am taking daily snapshot on my keyspace in production and
 already enable the incremental backups as well.

 According to the documentation, the incremental backup option will create
 an
 hard-link to the backup folder when new sstable is flushed. Snapshot will
 copy all the data/index/etc. files to a new folder.

 *Question:*
 What will happen (with enabling the incremental backup) when crash (due to
 any reason) the Cassandra before flushing the data as a SSTable (inserted
 data still in commitlog). In this case how can I backup/restore data?

 Do I need to backup the commitlogs as well and and replay during the server
 start to restore the data in commitlog files?

 Thanks.



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

tokLogo.png

Re: Cassandra backup queston regarding commitlogs

2012-04-29 Thread Roshan
Tamar

Please don't jump to other users discussions. If you want to ask any issue,
create a new one, please.

Thanks. 


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-question-regarding-commitlogs-tp7508823p7511913.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra backup queston regarding commitlogs

2012-04-29 Thread aaron morton
Each mutation is applied to the commit log before being applied to the 
memtable. On server start the SSTables are read before replaying the commit 
logs. This is part of the crash only software design and happens for every 
start.

AFAIk there is no facility to snapshot commit log files as they are closed. The 
best advice would be to to keep them on a mirror set for durability. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/04/2012, at 1:04 AM, Roshan wrote:

 Hi
 
 Currently I am taking daily snapshot on my keyspace in production and
 already enable the incremental backups as well.
 
 According to the documentation, the incremental backup option will create an
 hard-link to the backup folder when new sstable is flushed. Snapshot will
 copy all the data/index/etc. files to a new folder.
 
 *Question:*
 What will happen (with enabling the incremental backup) when crash (due to
 any reason) the Cassandra before flushing the data as a SSTable (inserted
 data still in commitlog). In this case how can I backup/restore data?
 
 Do I need to backup the commitlogs as well and and replay during the server
 start to restore the data in commitlog files?
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Cassandra backup queston regarding commitlogs

2012-04-29 Thread aaron morton
 1. If I already have a Cassandra cluster running, would changing the  
 incremental_backups parameter in the cassandra.yaml of each node, and then 
 restart it do the trick?
Yes it is a per node setting. 

 2. Assuming I am creating a daily snapshot, what is the gain from setting 
 incremental backup to true?

Better point in time recovery on a node. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/04/2012, at 6:41 PM, Tamar Fraenkel wrote:

 I want to add a couple of questions regrading incremental backups:
 1. If I already have a Cassandra cluster running, would changing the  
 incremental_backups parameter in the cassandra.yaml of each node, and then 
 restart it do the trick?
 2. Assuming I am creating a daily snapshot, what is the gain from setting 
 incremental backup to true?
 
 Thanks,
 Tamar
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 
 On Sat, Apr 28, 2012 at 4:04 PM, Roshan codeva...@gmail.com wrote:
 Hi
 
 Currently I am taking daily snapshot on my keyspace in production and
 already enable the incremental backups as well.
 
 According to the documentation, the incremental backup option will create an
 hard-link to the backup folder when new sstable is flushed. Snapshot will
 copy all the data/index/etc. files to a new folder.
 
 *Question:*
 What will happen (with enabling the incremental backup) when crash (due to
 any reason) the Cassandra before flushing the data as a SSTable (inserted
 data still in commitlog). In this case how can I backup/restore data?
 
 Do I need to backup the commitlogs as well and and replay during the server
 start to restore the data in commitlog files?
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.