RE: Replication lag between data center

SEAN_R_DURITY Thu, 19 May 2016 06:06:51 -0700

Just wanted to chime in that this is a very well-written and explained answer. 
Nice job, Jeff!

Sean Durity
From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, May 18, 2016 11:41 PM
To: user@cassandra.apache.org
Subject: Re: Replication lag between data center

Cassandra isn’t a traditional DB – it doesn’t “replicate” in the same way that 
a relational DB replicas.

Cassandra clients send mutations (via native protocol or thrift). Those 
mutations include a minimum consistency level for the server to return a 
successful write.

If a write says “Consistency: ALL” - then as soon as the write returns, the 
mutation exists on all nodes (no replication delay – it’s done).
If a write is anything other than ALL, it’s possible that any individual node 
may not have the write when the client is told the write succeeds. At that 
point, the coordinator will make a best effort to deliver the write to all 
nodes in real time, but may fail or time out. As far as I know, there are no 
metrics on this delivery – I believe the writes prior to the coordinator 
returning may have some basic data in TRACE, but wouldn’t expect writes after 
the coordinator returned to have tracing data available.

If any individual times out completely, the coordinator writes a hint. When the 
coordinator sees the node come back online, it will try to replay the writes by 
replaying the hints – this may happen minutes or hours later.

If it’s unable to replay hints, or if writes are missed for some other reason, 
the data may never “replicate” to the other nodes/Dcs on its own – you may need 
to manually “replicate” it using the `nodetool repair` tool.

Taken together, there’s no simple “replication lag” here – if you write with 
ALL, the lag is “none”. If you write with CL:QUORUM and read with CL:QUORUM, 
your effective lag is “probably none”, because missing replicas will 
read-repair the data on read. If you read or write with low consistency, your 
lag may be milliseconds, hours, weeks, or forever, depending on how long your 
link is down and how often you repair.

From: cass savy
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Date: Wednesday, May 18, 2016 at 8:03 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Subject: Replication lag between data center

How can we determine/measure the replication lag or latency between on premise 
data centers or cross region/Availability zones?

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: Replication lag between data center

Reply via email to