parallel processing - splitting data

Frank Hughes Thu, 19 Jan 2017 04:06:07 -0800

Hello there,

I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of
4.


I want to be able to run a java process on each node only selecting a 25%
of the data on each node,
so i can process all of the data in parallel on each node.

What is the best way to do this with the java driver ?

I was assuming I could retrieve the token ranges for each node and page
through the data using these ranges, but this includes the replicated data.
I was hoping there was away of only selecting the data that a node is
responsible for and avoiding the replicated data.

Many thanks for any help and guidance,

Frank Hughes

parallel processing - splitting data

Reply via email to