Hi,
We have a Cassandra database column family containing 320 millions rows
and each row contains about 15 columns. We want to take monthly dump of
this single column family contained in this database in text format.
We are planning to take following approach to implement this functionality
1.
The best way to generate dumps from Cassandra is via Hadoop integration (or
spark). You can find more info here:
http://www.datastax.com/documentation/cassandra/2.1/cassandra/configuration/configHadoop.html
http://wiki.apache.org/cassandra/HadoopSupport
On Thu, Oct 9, 2014 at 4:19 AM, Gaurav
You might also want to consider tools like
https://github.com/Netflix/aegisthus for the last step, which can help you
deal with tombstones and de-duplicate data.
Thanks,
Daniel
On Thu, Oct 9, 2014 at 12:19 AM, Gaurav Bhatnagar gbhatna...@gmail.com
wrote:
Hi,
We have a Cassandra database