Re: What does ReadRepair exactly do?
manuzhang wrote read quorum doesn't mean we read newest values from a quorum number of replicas but to ensure we read at least one newest value as long as write quorum succeeded beforehand and W+R N. I beg to differ here. Any read/write, by definition of quorum, should have at least n/2 + 1 replicas that agree on that read/write value. Responding to the user with a newer value, even if the write creating the new value hasn't completed cannot guarantee any read consistency 1. Hiller, Dean wrote Kind of an interesting question I think you are saying if a client read resolved only the two nodes as said in Aaron's email back to the client and read -repair was kicked off because of the inconsistent values and the write did not complete yet and I guess you would have two nodes go down to lose the value right after the read, and before write was finished such that the client read a value that was never stored in the database. The odds of two nodes going out are pretty slim though. Thanks, Dean Bingo! I do understand that the odds of a quorum nodes going down are low and that any subsequent read would achieve a quorum. However, I'm wondering what would be the right thing to do here, given that the client has particularly asked for a certain consistency on the read and cassandra returns a value that doesn't have the consistency. The heart of the problem here is that the coordinator responds to a client request assuming that the consistency has been achieved the moment is issues a row repair with the super-set of the resolved value; without receiving acknowledgement on the success of a repair from the replicas for a given consistency constraint. In order to adhere to the given consistency specification, the row repair (due to consistent reads) should repeat the read after issuing a consistency repair to ensure if the consistency is met. Like Manu mentioned, this could of course lead to a number of repeat reads if the writes arrive quickly - until the read gets timed out. However, note that we would still be honoring the consistency constraint for that read. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583400.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What does ReadRepair exactly do?
Hiller, Dean wrote in general it is okay to get the older or newer value. If you are reading 2 rows however instead of one, that may change. This is certainly interesting, as it could mean that the user could see a value that never met the required consistency. For instance with 3 replicas R1,R2,R3 and a quorum consistency, assume that R1 is initiating a read (becomes the coordinator) - notices a conflict with R2 (assume R1 has a more recent value) and initiates a read repair with its value. Meanwhile R2 and R3 have seen two different writes with newer values than what was computed by the read repair. If R1 were to respond back to the user with the value that was computed at the time of read repair, wouldn't it be a value that never met the consistency constraint? I was thinking if this should trigger another round of repair that tries to reach the consistency constraint with a newer value or time-out, which is the expected case when you don't meet the required consistency. Please let me know if I'm missing something here. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583366.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What does ReadRepair exactly do?
Hiller, Dean wrote I guess one more thing is I completely ignore your second write mainly because I assume it comes after we already read so your let's say you current state is node1 = val1 node2 = val1 node3 = val1 You do a write quorom of val=2 which is IN the middle!!! node1 = val1 node2 = val2 node3 = val1 (NOTICE the write is not complete yet) If you read from node1 and node3, you get val1. If you read from node1 and node2, you get val2 as a read repair will happen. Ie. You always get the older value or newer value. If you have two writes come in like so node1 = val1 node2 = val2 and node3= val3 Well, I think you can figure it out when you do a read ;). If your read quorum reads from node1 and node3 , you get val3, etc. etc. This is basically how it works….If your scenario is a web page, a user simply hits the refresh button and sees the values changing. I'm extending your example Later, Dean Thanks for the example Dean. This definitely clears things up when you have an overlap between the read and the write, and one comes after the other. I'm still missing, how read repairs behave. Just extending your example for the following case: 1. node1 = val1 node2 = val1 node3 = val1 2. You do a write operation (W1) with quorom of val=2 node1 = val1 node2 = val2 node3 = val1 (write val2 is not complete yet) 3. Now with a read (R1) from node1 and node2, a read repair will be initiated that needs to write val2 on node 1. node1 = val1; node2 = val2; node3 = val1 (read repair val2 is not complete yet) 4. Say, in the meanwhile node 1 receives a write val 4; Read repair for R1 now arrives at node 1 but sees a newer value val4. node1 = val4; node2 = val2; node3 = val1 (write val4 is not complete, read repair val2 not complete) In this case, for read R1, the value val2 does not have a quorum. Would read R1 return val2 or val4 ? Zhang, Manu wrote And we don't send read request to all of the three replicas (R1, R2, R3) if CL=QUOROM; just 2 of them depending on proximity Thanks Zhang. But, this again seems a little strange thing to do, since one (say R2) of the 2 close replicas (say R1,R2) might be down, resulting in a read failure while there are still enough number of replicas (R1 and R3) live to satisfy a read. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583372.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What does ReadRepair exactly do?
Hello, This conversation precisely targets a question that I had been having for a while - would be grateful if you someone cloud clarify it a little further: Considering the case of a repair created due to a consistency constraint (first case in the discussion above), would the following interpretation be correct ? 1. A digest mismatch exception is raised even if one among the many responses (even if consistency is met on an out-of-date value, say by virtue of timestamp). 2. A read is initiated by the callback to fetch data from all replicas 3. Resolve() is invoked to find the deltas for each replica that was out of date. 4. ReadRepair is scheduled to the above replicas. 5. Perform a normal read and check if this meets the consistency constraints. Mismatches would trigger a repair again. Assuming the above is true, would the mutations in step 4 and the read in step 5 happen in parallel ? In other words, would the time taken by the read correction be the round trip between the coordinator and its farthest replica that meets the consistency constraint. Thanks, Shankar -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583352.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What does ReadRepair exactly do?
manuzhang wrote why repair again? We block until the consistency constraint is met. Then the latest version is returned and repair is done asynchronously if any mismatch. We may retry read if fewer columns than required are returned. Just to make sure I understand you correct, considering the case when a read repair is in flight and a subsequent write affects one or more of the replicas that was scheduled to received the repair mutations. In this case, are you saying that we return the older version to the user rather than the latest version that was effected by the write ? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583355.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.