Periodic snapshots + incremental backups I think are pretty good in terms of restoring to point in time. But you must manage cleaning up your snapshots + incremental backups on your own. I believe that tablesnap ( https://github.com/JeremyGrosser/tablesnap) is a pretty decent approach in terms of keeping your sstables, per node, synched to a location off of your host (on S3 in fact). Not sure how portable it is to other block storage services however. S3+Lifecycle policy to go to Glacier would likely be the most cost effective for long term retention.
On Thu, Jun 16, 2016 at 4:30 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Also if we talk about backup strategy for Cassandra Data then essentially > there are couple of strategies that are adopted: > > 1. Incremental Backups. The old sstables will remain inside a backup > directory and can be shipped to a storage location like AWS Glacier, etc. > 2. Snapshotting : Hardlinks of sstables will get created. This is a very > fast process and latest data is captured into sstables after flushing > memtables, snapshots will be created in snapshots directory. But snapshot > does not provide you the feature to go back to a certain point in time but > incremental backups give you that feature. > > Depending on the use case, you can use 1 or 2 or both. > > On Fri, Jun 17, 2016 at 4:46 AM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> What kind of data are we talking here? >> Is it time series data with infrequent updates and only inserts or >> frequently updated data. How frequently is old data read. I ask this >> because your Node size planning and Compaction Strategy will essentially >> depend on these. >> >> I have known people go upto 3-5 TB per node if data is not updated >> frequently. >> >> Regards, >> Bhuvan >> >> On Fri, Jun 17, 2016 at 4:31 AM, <vasu.no...@gmail.com> wrote: >> >>> Bhuvan, >>> >>> Thanks for the info but actually I'm not looking for migration strategy. >>> just want to backup strategy and retention policy best practices >>> >>> Thanks, >>> Vasu >>> >>> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: >>> >>> Hi Vasu, >>> >>> Planet Cassandra has a documentation page for basic info about migrating >>> to cassandra from MySQL. What to expect and what not to. It can be found >>> here <http://planetcassandra.org/mysql-to-cassandra-migration/>. >>> >>> I had a look at this slide >>> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch> >>> a >>> while back. It provides a pretty reliable 4 Phase Sync strategy, starting >>> from Slide 31. Also the QA session of the talk is informative too - >>> http://www.doanduyhai.com/blog/?p=1757. >>> >>> Best Regards, >>> Bhuvan >>> >>> On Fri, Jun 17, 2016 at 4:03 AM, <vasu.no...@gmail.com> wrote: >>> >>>> Hi , >>>> >>>> I'm from relational world recently started working on Cassandra. I'm >>>> just wondering what is backup best practices for DB around 100 Tb with >>>> multi DC setup. >>>> >>>> >>>> Thanks, >>>> Vasu >>> >>> >>> >> >