Sylvain Lebresne created CASSANDRA-5113:
-------------------------------------------

             Summary: RepairCallback breaks CL guarantees
                 Key: CASSANDRA-5113
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5113
             Project: Cassandra
          Issue Type: Bug
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
             Fix For: 1.1.9
         Attachments: 0001-Always-ensure-CL-after-a-digest-mismatch.txt, 
0002-Rename-RowRepairResolver-to-RowDataResolver.txt

RepairCallback does not validate the consistency level of the query. It seems 
that this was done on purpose as the comments there says:
{noformat}
/**
 * The main difference between this and ReadCallback is, ReadCallback has a 
ConsistencyLevel
 * it needs to achieve.  Repair on the other hand is happy to repair whoever 
replies within the timeout.
 */
{noformat}
Concretely, the get() method of RepairCallback:
* waits for all endpoints, even if there is more than strictly required by the 
CL.
* if it timeouts, doesn't check it and always return a response.
* for some reason, it returns null unless there is strictly more than 1 
response.

All of that seems wrong to me. The result of RepairCallback is what is returned 
to the client in case of a digest mismatch on the first read. So we must ensure 
that the CL has been reached. Also, returning null where there is 1 response 
(or none) seems clearly wrong.

In fact I don't think we need a special callback for this "read all data" phase 
as it is a "normal" read (the fact we do a first read with digests is just an 
"optimization"). The only difference between the two phases should be how we 
resolve the responses (in the first case we have digest and in the 2nd we 
don't) but that's handled by the resolver.

So attaching a patch that removes RepairCallback and use ReadCallback instead.  
I'm also attaching a 2nd trivial patch that renames RowRepairResolver to 
RowDataResolver because I think it describe better what this actually do (i.e.  
the main goal is to resolve a full data read to answer the right value to the 
client, repairing inconsistent nodes is secondary).

The patch is against 1.1, because I think breaking CL guarantees is probably 
serious enough to warrant pushing this to 1.1.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to