Re: Cassandra backup to alternate location
No - they'll hardlink into the snapshot folder on each data directory. They are true hardlinks, so even if you could move it, it'd still be on the same filesystem. Typical behavior is to issue a snapshot, and then copy the data out as needed (using something like https://github.com/JeremyGrosser/tablesnap ). On Thu, Jun 28, 2018 at 10:00 AM, Lohchab, Sanjeev wrote: > Hi All, > > > > I am trying to backup Cassandra DB, but by default it is saving the > snapshots in the default location. > > Is there any way we can specific the location where we want to store the > snapshots. > > > > Regards > > Sanjeev >
Re: Cassandra backup via snapshots in production
On Thu, Nov 27, 2014 at 2:34 AM, Jens Rantil jens.ran...@tink.se wrote: Late answer; You can find my backup script here: https://gist.github.com/JensRantil/a8150e998250edfcd1a3 Why not use the much more robustly designed and maintained community based project, tablesnap? https://github.com/JeremyGrosser/tablesnap =Rob
Re: Cassandra backup via snapshots in production
On Mon, Dec 1, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote: Why not use the much more robustly designed and maintained community based project, tablesnap? For two reasons: - Because I am tired of the deployment model of Python apps which require me to set up virtual environments. - Because it did, AFAIK, not support (asymmetric) encryption before uploading. -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
RE: Cassandra backup via snapshots in production
Thanks a lot for your answers! What we plan to do is: - auto_snapshot = true - if the human errors happened on D-5: o we will bring the cluster offline o purge all data o import snapshots prior D-5 (and delete snapshots after D-5) o upload all missing data between D-5 and D o bring the cluster online Do you think it would work? From: Jens Rantil [mailto:jens.ran...@tink.se] Sent: mardi 25 novembre 2014 10:03 To: user@cassandra.apache.org Subject: Re: Cassandra backup via snapshots in production Truncate does trigger snapshot creation though Doesn’t it? With “auto_snapshot: true” it should. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.semailto:jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.sehttp://www.tink.se Facebook Linkedin Twitter On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.commailto:doanduy...@gmail.com wrote: True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com a écrit : On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.semailto:jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
Re: Cassandra backup via snapshots in production
Late answer; You can find my backup script here: https://gist.github.com/JensRantil/a8150e998250edfcd1a3 Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to backup it to S3. We run the script is run periodically on every node. Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to all nodes. Like I said, there’s lots of love that can be put into a backup system. Note that the script has the following limitations: * It does not checksum the files. However s3cmd website states that it by default compares MD5 and file size on upload. * It does not do purging of files on S3 (which you could configure using “Object Lifecycles”). * It does not warn you that a backup fails. Check your logs periodically. * It does not do any advanced logging. Make sure to pipe the output to a file or the `syslog` utility. * It does not do continuous/point-in-time backup. That said, it does its job for us for now. Feel free to propose improvements! Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh w...@cs.umd.edu wrote: Jens, I'd be interested in seeing your script. We've been thinking of doing exactly that but uploading to Glacier instead. Thanks, Bill On Nov 21, 2014, at 11:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. Regarding backup, I have a small script that creates a named snapshot and for each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It took me an hour to write and roll out to all our nodes. The whole process is currently logged, but eventually I will also send an e-mail if backup fails. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com wrote: Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB HDD on each node for C* daily/weekly snapshots. - Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source. We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us. Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
Re: Cassandra backup via snapshots in production
True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit : On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba
Re: Cassandra backup via snapshots in production
Truncate does trigger snapshot creation though Doesn’t it? With “auto_snapshot: true” it should. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.com wrote: True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit : On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba
Re: Cassandra backup via snapshots in production
The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. Regarding backup, I have a small script that creates a named snapshot and for each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It took me an hour to write and roll out to all our nodes. The whole process is currently logged, but eventually I will also send an e-mail if backup fails. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com wrote: Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB HDD on each node for C* daily/weekly snapshots. - Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source. We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us. Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
Re: Cassandra backup via snapshots in production
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba
Re: Cassandra backup via snapshots in production
On Tue, Nov 18, 2014 at 6:50 AM, Ngoc Minh VO ngocminh...@bnpparibas.com wrote: We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). https://github.com/JeremyGrosser/tablesnap =Rob
Re: cassandra backup
Hi Marcelo, Cassandra provides and eventually consistent model for backups. You can do staggered backups of data, with the idea that if you restore a node, and then do a repair, your data will be once again consistent. Cassandra will not automatically copy the data to other nodes (other than via hinted handoff). You should manually run repair after restoring a node. You should take snapshots when doing a backup, as it keeps the data you are backing up relevant to a single point in time, otherwise compaction could add/delete files one you mid-backup, or worse, I imagine attempt to access a SSTable mid-write. Snapshots work by using links, and don't take additional storage to perform. In our process we create the snapshot, perform the backup, and then clear the snapshot. One thing to keep in mind in your S3 cost analysis is that, even though storage is cheap, reads/writes to S3 are not (especially writes). If you are using LeveledCompaction, or otherwise have a ton of SSTables, some people have encountered increased costs moving the data to S3. Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync data too. Thus far this has worked very well for us. -Mike On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello everyone, I am trying to create backups of my data on AWS. My goal is to store the backups on S3 or glacier, as it's cheap to store this kind of data. So, if I have a cluster with N nodes, I would like to copy data from all N nodes to S3 and be able to restore later. I know Priam does that (we were using it), but I am using the latest cassandra version and we plan to use DSE some time, I am not sure Priam fits this case. I took a look at the docs: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html And I am trying to understand if it's really needed to take a snapshot to create my backup. Suppose I do a flush and copy the sstables from each node, 1 by one, to s3. Not all at the same time, but one by one. When I try to restore my backup, data from node 1 will be older than data from node 2. Will this cause problems? AFAIK, if I am using a replication factor of 2, for instance, and Cassandra sees data from node X only, it will automatically copy it to other nodes, right? Is there any chance of cassandra nodes become corrupt somehow if I do my backups this way? Best regards, Marcelo Valle.
Re: cassandra backup
You should look at this - https://github.com/amorton/cassback i dont believe its setup to use 1.2.10 and above but i believe is just small tweeks to get it running. Thanks Rahul On Fri, Dec 6, 2013 at 7:09 PM, Michael Theroux mthero...@yahoo.com wrote: Hi Marcelo, Cassandra provides and eventually consistent model for backups. You can do staggered backups of data, with the idea that if you restore a node, and then do a repair, your data will be once again consistent. Cassandra will not automatically copy the data to other nodes (other than via hinted handoff). You should manually run repair after restoring a node. You should take snapshots when doing a backup, as it keeps the data you are backing up relevant to a single point in time, otherwise compaction could add/delete files one you mid-backup, or worse, I imagine attempt to access a SSTable mid-write. Snapshots work by using links, and don't take additional storage to perform. In our process we create the snapshot, perform the backup, and then clear the snapshot. One thing to keep in mind in your S3 cost analysis is that, even though storage is cheap, reads/writes to S3 are not (especially writes). If you are using LeveledCompaction, or otherwise have a ton of SSTables, some people have encountered increased costs moving the data to S3. Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync data too. Thus far this has worked very well for us. -Mike On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello everyone, I am trying to create backups of my data on AWS. My goal is to store the backups on S3 or glacier, as it's cheap to store this kind of data. So, if I have a cluster with N nodes, I would like to copy data from all N nodes to S3 and be able to restore later. I know Priam does that (we were using it), but I am using the latest cassandra version and we plan to use DSE some time, I am not sure Priam fits this case. I took a look at the docs: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html And I am trying to understand if it's really needed to take a snapshot to create my backup. Suppose I do a flush and copy the sstables from each node, 1 by one, to s3. Not all at the same time, but one by one. When I try to restore my backup, data from node 1 will be older than data from node 2. Will this cause problems? AFAIK, if I am using a replication factor of 2, for instance, and Cassandra sees data from node X only, it will automatically copy it to other nodes, right? Is there any chance of cassandra nodes become corrupt somehow if I do my backups this way? Best regards, Marcelo Valle.
Re: cassandra backup
I believe SSTables are written to a temporary file then moved. If I remember correctly, tools like tablesnap listen for the inotify event IN_MOVED_TO. This should handle the try to back up sstable while in mid-write issue. On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux mthero...@yahoo.com wrote: Hi Marcelo, Cassandra provides and eventually consistent model for backups. You can do staggered backups of data, with the idea that if you restore a node, and then do a repair, your data will be once again consistent. Cassandra will not automatically copy the data to other nodes (other than via hinted handoff). You should manually run repair after restoring a node. You should take snapshots when doing a backup, as it keeps the data you are backing up relevant to a single point in time, otherwise compaction could add/delete files one you mid-backup, or worse, I imagine attempt to access a SSTable mid-write. Snapshots work by using links, and don't take additional storage to perform. In our process we create the snapshot, perform the backup, and then clear the snapshot. One thing to keep in mind in your S3 cost analysis is that, even though storage is cheap, reads/writes to S3 are not (especially writes). If you are using LeveledCompaction, or otherwise have a ton of SSTables, some people have encountered increased costs moving the data to S3. Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync data too. Thus far this has worked very well for us. -Mike On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello everyone, I am trying to create backups of my data on AWS. My goal is to store the backups on S3 or glacier, as it's cheap to store this kind of data. So, if I have a cluster with N nodes, I would like to copy data from all N nodes to S3 and be able to restore later. I know Priam does that (we were using it), but I am using the latest cassandra version and we plan to use DSE some time, I am not sure Priam fits this case. I took a look at the docs: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html And I am trying to understand if it's really needed to take a snapshot to create my backup. Suppose I do a flush and copy the sstables from each node, 1 by one, to s3. Not all at the same time, but one by one. When I try to restore my backup, data from node 1 will be older than data from node 2. Will this cause problems? AFAIK, if I am using a replication factor of 2, for instance, and Cassandra sees data from node X only, it will automatically copy it to other nodes, right? Is there any chance of cassandra nodes become corrupt somehow if I do my backups this way? Best regards, Marcelo Valle. -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: cassandra backup
On Fri, Dec 6, 2013 at 5:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am trying to create backups of my data on AWS. My goal is to store the backups on S3 or glacier, as it's cheap to store this kind of data. So, if I have a cluster with N nodes, I would like to copy data from all N nodes to S3 and be able to restore later. https://github.com/synack/tablesnap Automated backup, restore, purging, intended for use with Cassandra. =Rob
Re: Cassandra backup
There is this: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement But you'll need to design your data model around the fact that this is only as granular as 1 column family Best, michael From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 18, 2013 6:06 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra backup Hi – We have a req to store around 90 days of data per user. Last 7 days of data is going to be accessed frequently. Is there a way we can have the recent data (7 days) in SSD and the rest of the data in the HDD ? Do we take a snapshot every 7 days and use a separate ‘archive’ cluster to serve the old data and a ‘active’ cluster to serve recent data ? Any links/thoughts would be helpful. Thanks, Kanwar
RE: Cassandra backup
Thanks. I will look into the details. One issue I see is that if I have only one column family which needs only the last 7 days data to be on SSD and the rest to be on the HDD, how will that work. From: Michael Kjellman [mailto:mkjell...@barracuda.com] Sent: 18 February 2013 20:08 To: user@cassandra.apache.org Subject: Re: Cassandra backup There is this: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement But you'll need to design your data model around the fact that this is only as granular as 1 column family Best, michael From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 18, 2013 6:06 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra backup Hi - We have a req to store around 90 days of data per user. Last 7 days of data is going to be accessed frequently. Is there a way we can have the recent data (7 days) in SSD and the rest of the data in the HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster to serve the old data and a 'active' cluster to serve recent data ? Any links/thoughts would be helpful. Thanks, Kanwar
Re: Cassandra backup question regarding commitlogs
The incremental backups are generated when the flush is complete (Only during the flush), If the node crash before the flush completes then the commit logs in the local node's backup for the data in memory. It wouldn't help to copy the Commit log across because they are not immutable (They are recycled). There is commit log backup in 1.1.1 (Yet to be released) https://issues.apache.org/jira/browse/CASSANDRA-3690 Thanks, /VJ On Sun, Apr 29, 2012 at 3:29 PM, Roshan codeva...@gmail.com wrote: Hi Currently I am taking daily snapshot on my keyspace in production and already enable the incremental backups as well. According to the documentation, the incremental backup option will create an hard-link to the backup folder when new sstable is flushed. Snapshot will copy all the data/index/etc. files to a new folder. Question: What will happen (with enabling the incremental backup) when crash (due to any reason) the Cassandra before flushing the data as a SSTable (inserted data still in commitlog). In this case how can I backup/restore data? Do I need to backup the commitlogs as well and and replay during the server start to restore the data in commitlog files? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-question-regarding-commitlogs-tp7511918.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
If you delete the commit logs you are rolling back to exactly what was in the snapshot. When you take a snapshot it flushes the memtables first, so there is nothing in the commit log that is not in the snapshot. Rolling back to a snapshot is rollback to that point in time. If you want to restore to any point in time you need snapshots + incremental snapshot + commit log (for things that have not made it to sstables). Otherwise there is a potential loss of data that has not been flushed to disk. This is different to what the DS docs are talking about. I'm not sure why they are saying delete the commit log, try asking on their forum http://www.datastax.com/support-forums/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/05/2012, at 12:02 PM, Roshan wrote: Any help regarding this is appreciated. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7518544.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
Many thanks Aaron. I will post a support issue for them. But will keep the snapshot + incremental backups + commitlogs to recover any failure situation. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7518866.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
Hi Aaron Thanks for the comments. Yes for the durability will keep them in a safe place. But such crash situation, how can I restore the data (because those are not in a SSTable and only in commit log). Do I need to replay only that commit log when server starts after crash? Will it override the same keys with values? Appreciate your reply on this. Kind Regards /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/deleted-tp7508823p7512499.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
When the server starts it reads the SSTables then applies the Commit Logs. There is nothing you need to do other than leave the commit logs where they are. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/04/2012, at 6:02 PM, Roshan wrote: Hi Aaron Thanks for the comments. Yes for the durability will keep them in a safe place. But such crash situation, how can I restore the data (because those are not in a SSTable and only in commit log). Do I need to replay only that commit log when server starts after crash? Will it override the same keys with values? Appreciate your reply on this. Kind Regards /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/deleted-tp7508823p7512499.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
Many Thanks Aaron. According to the datastax restore documentation, they ask to remove the commitlogs before restoring (Clear all files the /var/lib/cassandra/commitlog (by default)). In that case better not to follow this step in a server rash situation. Thanks /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7515217.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
Can you provide a link to that page ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/05/2012, at 10:12 AM, Roshan wrote: Many Thanks Aaron. According to the datastax restore documentation, they ask to remove the commitlogs before restoring (Clear all files the /var/lib/cassandra/commitlog (by default)). In that case better not to follow this step in a server rash situation. Thanks /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823p7515217.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
I want to add a couple of questions regrading incremental backups: 1. If I already have a Cassandra cluster running, would changing the i ncremental_backups parameter in the cassandra.yaml of each node, and then restart it do the trick? 2. Assuming I am creating a daily snapshot, what is the gain from setting incremental backup to true? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sat, Apr 28, 2012 at 4:04 PM, Roshan codeva...@gmail.com wrote: Hi Currently I am taking daily snapshot on my keyspace in production and already enable the incremental backups as well. According to the documentation, the incremental backup option will create an hard-link to the backup folder when new sstable is flushed. Snapshot will copy all the data/index/etc. files to a new folder. *Question:* What will happen (with enabling the incremental backup) when crash (due to any reason) the Cassandra before flushing the data as a SSTable (inserted data still in commitlog). In this case how can I backup/restore data? Do I need to backup the commitlogs as well and and replay during the server start to restore the data in commitlog files? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. tokLogo.png
Re: Cassandra backup queston regarding commitlogs
Tamar Please don't jump to other users discussions. If you want to ask any issue, create a new one, please. Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-question-regarding-commitlogs-tp7508823p7511913.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
Each mutation is applied to the commit log before being applied to the memtable. On server start the SSTables are read before replaying the commit logs. This is part of the crash only software design and happens for every start. AFAIk there is no facility to snapshot commit log files as they are closed. The best advice would be to to keep them on a mirror set for durability. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/04/2012, at 1:04 AM, Roshan wrote: Hi Currently I am taking daily snapshot on my keyspace in production and already enable the incremental backups as well. According to the documentation, the incremental backup option will create an hard-link to the backup folder when new sstable is flushed. Snapshot will copy all the data/index/etc. files to a new folder. *Question:* What will happen (with enabling the incremental backup) when crash (due to any reason) the Cassandra before flushing the data as a SSTable (inserted data still in commitlog). In this case how can I backup/restore data? Do I need to backup the commitlogs as well and and replay during the server start to restore the data in commitlog files? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra backup queston regarding commitlogs
1. If I already have a Cassandra cluster running, would changing the incremental_backups parameter in the cassandra.yaml of each node, and then restart it do the trick? Yes it is a per node setting. 2. Assuming I am creating a daily snapshot, what is the gain from setting incremental backup to true? Better point in time recovery on a node. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/04/2012, at 6:41 PM, Tamar Fraenkel wrote: I want to add a couple of questions regrading incremental backups: 1. If I already have a Cassandra cluster running, would changing the incremental_backups parameter in the cassandra.yaml of each node, and then restart it do the trick? 2. Assuming I am creating a daily snapshot, what is the gain from setting incremental backup to true? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sat, Apr 28, 2012 at 4:04 PM, Roshan codeva...@gmail.com wrote: Hi Currently I am taking daily snapshot on my keyspace in production and already enable the incremental backups as well. According to the documentation, the incremental backup option will create an hard-link to the backup folder when new sstable is flushed. Snapshot will copy all the data/index/etc. files to a new folder. *Question:* What will happen (with enabling the incremental backup) when crash (due to any reason) the Cassandra before flushing the data as a SSTable (inserted data still in commitlog). In this case how can I backup/restore data? Do I need to backup the commitlogs as well and and replay during the server start to restore the data in commitlog files? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-backup-queston-regarding-commitlogs-tp7508823.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.