The easiest way to do this is to create another Cassandra DC and point
Spark to it, since Spark can operate directly on data in Cassandra. No
impact on C* performance and no complex backup/restore process required,
just let Cassandra replicate the data for you.
If you need a scalable bulk
I have a scenario where data has to be loaded into Spark nodes from two data
stores: Oracle and Cassandra. We did the initial loading of data and found a
way to do daily incremental loading from Oracle to Spark.
I’m tying to figure our how to do this from C*. What tools are available in C*
to