Re: Cassandra backup via snapshots in production

2014-12-01 Thread Jens Rantil
On Mon, Dec 1, 2014 at 8:39 PM, Robert Coli  wrote:

> Why not use the much more robustly designed and maintained community based
> project, tablesnap?


For two reasons:

   - Because I am tired of the deployment model of Python apps which
   require me to set up virtual environments.
   - Because it did, AFAIK, not support (asymmetric) encryption before
   uploading.

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Cassandra backup via snapshots in production

2014-12-01 Thread Robert Coli
On Thu, Nov 27, 2014 at 2:34 AM, Jens Rantil  wrote:

> Late answer; You can find my backup script here:
> https://gist.github.com/JensRantil/a8150e998250edfcd1a3
>

Why not use the much more robustly designed and maintained community based
project, tablesnap?

https://github.com/JeremyGrosser/tablesnap

=Rob


Re: Cassandra backup via snapshots in production

2014-11-27 Thread Jens Rantil
Late answer; You can find my backup script here: 
https://gist.github.com/JensRantil/a8150e998250edfcd1a3


Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using 
s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to 
backup it to S3. We run the script is run periodically on every node.




Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to 
all nodes.




Like I said, there’s lots of love that can be put into a backup system. Note 
that the script has the following limitations:

 * It does not checksum the files. However s3cmd website states that it by 
default compares MD5 and file size on upload.

 * It does not do purging of files on S3 (which you could configure using 
“Object Lifecycles”).

 * It does not warn you that a backup fails. Check your logs periodically.

 * It does not do any advanced logging. Make sure to pipe the output to a file 
or the `syslog` utility.

 * It does not do continuous/point-in-time backup.




That said, it does its job for us for now.




Feel free to propose improvements!




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh  wrote:

> Jens,
> I'd be interested in seeing your script. We've been thinking of doing exactly 
> that but uploading to Glacier instead.
> Thanks, Bill
>> On Nov 21, 2014, at 11:40 AM, Jens Rantil  wrote:
>> 
>> > The main purpose is to protect us from human errors (eg. unexpected 
>> > manipulations: delete, drop tables, …).
>> 
>> If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
>> will be enough to protect you.
>> 
>> Regarding backup, I have a small script that creates a named snapshot and 
>> for each sstable; encrypts, uploads to S3 and deletes the snapshotted 
>> sstable. It took me an hour to write and roll out to all our nodes. The 
>> whole process is currently logged, but eventually I will also send an e-mail 
>> if backup fails.
>> 
>> ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: 
>> +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
>> 
>> 
>> On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO  
>> wrote:
>> 
>> Hello all,
>> 
>> 
>> 
>> 
>>  
>> 
>> We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
>> nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
>> 
>> 
>> 
>> The main purpose is to protect us from human errors (eg. unexpected 
>> manipulations: delete, drop tables, …).
>> 
>> 
>> 
>> 
>>  
>> 
>> We are thinking of:
>> 
>> 
>> 
>> -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
>> 
>> 
>> 
>> -  Restore: load the most recent snapshots or latest “non-corrupted” 
>> ones and replay missing data imports from other data source.
>> 
>> 
>> 
>> 
>>  
>> 
>> We would like to know if somebody are using Cassandra’s backup feature in 
>> production and could share your experience with us.
>> 
>> 
>> 
>> 
>>  
>> 
>> Your help would be greatly appreciated.
>> 
>> 
>> 
>> Best regards,
>> 
>> 
>> 
>> Minh
>> 
>> 
>> 
>> 
>> This message and any attachments (the "message") is
>> intended solely for the intended addressees and is confidential. 
>> If you receive this message in error,or are not the intended recipient(s), 
>> please delete it and any copies from your systems and immediately notify
>> the sender. Any unauthorized view, use that does not comply with its 
>> purpose, 
>> dissemination or disclosure, either whole or partial, is prohibited. Since 
>> the internet 
>> cannot guarantee the integrity of this message which may not be reliable, 
>> BNP PARIBAS 
>> (and its subsidiaries) shall not be liable for the message if modified, 
>> changed or falsified. 
>> Do not print this message unless it is necessary,consider the environment.
>> 
>> --
>> 
>> Ce message et toutes les pieces jointes (ci-apres le "message") 
>> sont etablis a l'intention exclusive de ses destinataires et sont 
>> confidentiels.
>> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
>> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
>> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation 
>> de 
>> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
>> publication, totale ou partielle, est interdite. L'Internet ne permettant 
>> pas d'assurer
>> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
>> (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
>> dans l'hypothese
>> ou il aurait ete modifie, deforme ou falsifie. 
>> N'imprimez ce message que si necessaire, pensez a l'environnement.
>> 
>> 

RE: Cassandra backup via snapshots in production

2014-11-27 Thread Ngoc Minh VO
Thanks a lot for your answers!

What we plan to do is:

-  auto_snapshot = true

-  if the human errors happened on D-5:

o   we will bring the cluster offline

o   purge all data

o   import snapshots prior D-5 (and delete snapshots after D-5)

o   upload all missing data between D-5 and D

o   bring the cluster online

Do you think it would work?

From: Jens Rantil [mailto:jens.ran...@tink.se]
Sent: mardi 25 novembre 2014 10:03
To: user@cassandra.apache.org
Subject: Re: Cassandra backup via snapshots in production

> Truncate does trigger snapshot creation though

Doesn’t it? With “auto_snapshot: true” it should.

——— Jens Rantil Backend engineer Tink AB Email: 
jens.ran...@tink.se<mailto:jens.ran...@tink.se> Phone: +46 708 84 18 32 Web: 
www.tink.se<http://www.tink.se> Facebook Linkedin Twitter


On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan 
mailto:doanduy...@gmail.com>> wrote:

True

Delete in CQL just create tombstone so from the storage engine pov it's just 
adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, "Robert Coli" 
mailto:rc...@eventbrite.com>> a écrit :
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil 
mailto:jens.ran...@tink.se>> wrote:
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).

If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.

OP includes "delete" in their list of "unexpected manipulations", and 
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba



This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


Re: Cassandra backup via snapshots in production

2014-11-25 Thread Jens Rantil
> Truncate does trigger snapshot creation though




Doesn’t it? With “auto_snapshot: true” it should.




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan  wrote:

> True
> Delete in CQL just create tombstone so from the storage engine pov it's
> just adding some physical columns
> Truncate does trigger snapshot creation though
> Le 21 nov. 2014 19:29, "Robert Coli"  a écrit :
>> On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil  wrote:
>>
>>> > The main purpose is to protect us from human errors (eg. unexpected
>>> manipulations: delete, drop tables, …).
>>>
>>> If that is the main purpose, having "auto_snapshot: true” in
>>> cassandra.yaml will be enough to protect you.
>>>
>>
>> OP includes "delete" in their list of "unexpected manipulations", and
>> auto_snapshot: true will not protect you in any way from DELETE.
>>
>> =Rob
>> http://twitter.com/rcolidba
>>

Re: Cassandra backup via snapshots in production

2014-11-25 Thread DuyHai Doan
True

Delete in CQL just create tombstone so from the storage engine pov it's
just adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, "Robert Coli"  a écrit :

> On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil  wrote:
>
>> > The main purpose is to protect us from human errors (eg. unexpected
>> manipulations: delete, drop tables, …).
>>
>> If that is the main purpose, having "auto_snapshot: true” in
>> cassandra.yaml will be enough to protect you.
>>
>
> OP includes "delete" in their list of "unexpected manipulations", and
> auto_snapshot: true will not protect you in any way from DELETE.
>
> =Rob
> http://twitter.com/rcolidba
>


Re: Cassandra backup via snapshots in production

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil  wrote:

> > The main purpose is to protect us from human errors (eg. unexpected
> manipulations: delete, drop tables, …).
>
> If that is the main purpose, having "auto_snapshot: true” in
> cassandra.yaml will be enough to protect you.
>

OP includes "delete" in their list of "unexpected manipulations", and
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba


Re: Cassandra backup via snapshots in production

2014-11-21 Thread Jens Rantil
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).




If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.




Regarding backup, I have a small script that creates a named snapshot and for 
each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It 
took me an hour to write and roll out to all our nodes. The whole process is 
currently logged, but eventually I will also send an e-mail if backup fails.


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO 
wrote:

> Hello all,
> We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
> nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).
> We are thinking of:
> -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
> -  Restore: load the most recent snapshots or latest “non-corrupted” 
> ones and replay missing data imports from other data source.
> We would like to know if somebody are using Cassandra’s backup feature in 
> production and could share your experience with us.
> Your help would be greatly appreciated.
> Best regards,
> Minh
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential. 
> If you receive this message in error,or are not the intended recipient(s), 
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its purpose, 
> dissemination or disclosure, either whole or partial, is prohibited. Since 
> the internet 
> cannot guarantee the integrity of this message which may not be reliable, BNP 
> PARIBAS 
> (and its subsidiaries) shall not be liable for the message if modified, 
> changed or falsified. 
> Do not print this message unless it is necessary,consider the environment.
> --
> Ce message et toutes les pieces jointes (ci-apres le "message") 
> sont etablis a l'intention exclusive de ses destinataires et sont 
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
> publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
> d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie. 
> N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Cassandra backup via snapshots in production

2014-11-19 Thread Robert Coli
On Tue, Nov 18, 2014 at 6:50 AM, Ngoc Minh VO 
wrote:

>   We are looking for a solution to backup data in our C* cluster (v2.0.x,
> 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
>
> The main purpose is to protect us from human errors (eg. unexpected
> manipulations: delete, drop tables, …).
>

https://github.com/JeremyGrosser/tablesnap

=Rob