Re: What does ReadRepair exactly do?

2012-10-25 Thread shankarpnsn
manuzhang wrote
 read quorum doesn't mean we read newest values from a quorum number of
 replicas but to ensure we read at least one newest value as long as write
 quorum succeeded beforehand and W+R  N.

I beg to differ here. Any read/write, by definition of quorum, should have
at least n/2 + 1 replicas that agree on that read/write value. Responding to
the user with a newer value, even if the write creating the new value hasn't
completed cannot guarantee any read consistency  1. 


Hiller, Dean wrote
 Kind of an interesting question

 I think you are saying if a client read resolved only the two nodes as
 said in Aaron's email back to the client and read -repair was kicked off
 because of the inconsistent values and the write did not complete yet and
 I guess you would have two nodes go down to lose the value right after
 the
 read, and before write was finished such that the client read a value
 that
 was never stored in the database.  The odds of two nodes going out are
 pretty slim though.
 Thanks,
 Dean

Bingo! I do understand that the odds of a quorum nodes going down are low
and that any subsequent read would achieve a quorum. However, I'm wondering
what would be the right thing to do here, given that the client has
particularly asked for a certain consistency on the read and cassandra
returns a value that doesn't have the consistency. The heart of the problem
here is that the coordinator responds to a client request assuming that
the consistency has been achieved the moment is issues a row repair with the
super-set of the resolved value; without receiving acknowledgement on the
success of a repair from the replicas for a given consistency constraint. 

In order to adhere to the given consistency specification, the row repair
(due to consistent reads) should repeat the read after issuing a
consistency repair to ensure if the consistency is met. Like Manu
mentioned, this could of course lead to a number of repeat reads if the
writes arrive quickly - until the read gets timed out. However, note that we
would still be honoring the consistency constraint for that read. 



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583400.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What does ReadRepair exactly do?

2012-10-24 Thread shankarpnsn
Hiller, Dean wrote
 in general it is okay to get the older or newer value.  If you are reading
 2 rows however instead of one, that may change.

This is certainly interesting, as it could mean that the user could see a
value that never met the required consistency. For instance with 3 replicas
R1,R2,R3 and a quorum consistency, assume that R1 is initiating a read
(becomes the coordinator) - notices a conflict with R2 (assume R1 has a more
recent value) and initiates a read repair with its value. Meanwhile R2 and
R3 have seen two different writes with newer values than what was computed
by the read repair. If R1 were to respond back to the user with the value
that was computed at the time of read repair, wouldn't it be a value that
never met the consistency constraint? I was thinking if this should trigger
another round of repair that tries to reach the consistency constraint with
a newer value or time-out, which is the expected case when you don't meet
the required consistency. Please let me know if I'm missing something here. 



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583366.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What does ReadRepair exactly do?

2012-10-24 Thread shankarpnsn
Hiller, Dean wrote
 I guess one more thing is I completely ignore your second write mainly
 because I assume it comes after we already read so your let's say you
 current state is
 
 node1 = val1 node2 = val1 node3 = val1
 
 You do a write quorom of val=2 which is IN the middle!!!
 
 node1 = val1 node2 = val2 node3 = val1  (NOTICE the write is not complete
 yet)
 
 If you read from node1 and node3, you get val1.  If you read from node1
 and node2, you get val2 as a read repair will happen.
 
 Ie. You always get the older value or newer value.
 
 If you have two writes come in like so
 
 node1 = val1 node2 = val2 and node3= val3
 
 Well, I think you can figure it out when you do a read ;).  If your read
 quorum reads from node1 and node3 , you get val3, etc. etc.
 
 This is basically how it works….If your scenario is a web page, a user
 simply hits the refresh button and sees the values changing. I'm extending
 your example 
 
 Later,
 Dean

Thanks for the example Dean. This definitely clears things up when you have
an overlap between the read and the write, and one comes after the other.
I'm still missing, how read repairs behave. Just extending your example for
the following case: 

1. node1 = val1 node2 = val1 node3 = val1

2. You do a write operation (W1) with quorom of val=2
node1 = val1 node2 = val2 node3 = val1  (write val2 is not complete yet)

3. Now with a read (R1) from node1 and node2, a read repair will be
initiated that needs to write val2 on node 1.  
node1 = val1; node2 = val2; node3 = val1  (read repair val2 is not complete
yet)

4. Say, in the meanwhile node 1 receives a write val 4; Read repair for R1
now arrives at node 1 but sees a newer value val4.
node1 = val4; node2 = val2; node3 = val1  (write val4 is not complete, read
repair val2 not complete)

In this case, for read R1, the value val2 does not have a quorum. Would read
R1 return val2 or val4 ? 


Zhang, Manu wrote
 And we don't send read request to all of the three replicas (R1, R2, R3)
 if CL=QUOROM; just 2 of them depending on proximity

Thanks Zhang. But, this again seems a little strange thing to do, since one
(say R2) of the 2 close replicas (say R1,R2) might be down, resulting in a
read failure while there are still enough number of replicas (R1 and R3)
live to satisfy a read. 



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583372.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What does ReadRepair exactly do?

2012-10-23 Thread shankarpnsn
Hello, 

This conversation precisely targets a question that I had been having for a
while - would be grateful if you someone cloud clarify it a little further: 

Considering the case of a repair created due to a consistency constraint
(first case in the discussion above), would the following interpretation be
correct ?

1. A digest mismatch exception is raised even if one among the many
responses (even if consistency is met on an out-of-date value, say by virtue
of timestamp).
2. A read is initiated by the callback to fetch data from all replicas
3. Resolve() is invoked to find the deltas for each replica that was out of
date. 
4. ReadRepair is scheduled to the above replicas. 
5. Perform a normal read and check if this meets the consistency
constraints. Mismatches would trigger a repair again. 

Assuming the above is true, would the mutations in step 4 and the read in
step 5 happen in parallel ? In other words, would the time taken by the read
correction be the round trip between the coordinator and its farthest
replica that meets the consistency constraint.  

Thanks,
Shankar



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583352.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What does ReadRepair exactly do?

2012-10-23 Thread shankarpnsn
manuzhang wrote
 why repair again? We block until the consistency constraint is met. Then
 the latest version is returned and repair is done asynchronously if any
 mismatch. We may retry read if fewer columns than required are returned.

Just to make sure I understand you correct, considering the case when a read
repair is in flight and a subsequent write affects one or more of the
replicas that was scheduled to received the repair mutations. In this case,
are you saying that we return the older version to the user rather than the
latest version that was effected by the write ?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583355.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.