Cassandra datacenters replication advanced usage

Fabrice Douchant Tue, 02 Jun 2015 04:07:57 -0700

Hi everyone.

For a project, we use a Cassandra cluster in order to have fast reads/writes on 
a large number of (column oriented) generated data.


Until now, we only had 1 datacenter for prototyping.

We now plan to split our cluster in 2 datacenters to meet performance 
requirements (the data transfer between both datacenter is quite slow):

datacenter #1 : located near our data producer services : intensively writes 
all data in Cassandra periodically (each writes has a "run_id" column in its 
primary key)
datacenter #2 : located near our data consumer services: intensively reads all 
data produced by datacenter #1 for a given "run _id".
However, we would like our consumer services to access data only in the 
datacenter near them (datacenter #2) and when all data for a given "run_id" 
have been completely replicated from datacenter #1 (data generated by the 
producer services).

My question is : how can we ensure that all data have been replicated in 
datancenter #2 before telling producer services (near datacenter #2) to start 
using them ?

Our best solutions so far (but still not good enough :-P):

producer services (datacenter #1) writes in consistency "all". But this leads 
to poor partitioning failure tolerance AND really bad writes performances.
producer services (datacenter #1) writes in consistency "local_quorum" and a 
last "run finished" value could be written in consistency "all". But it seems 
Cassandra does not ensure replication ordering.
Do you have any suggestion ?

Thanks a lot,

Fabrice

Cassandra datacenters replication advanced usage

Reply via email to