Sylvain Lebresne created CASSANDRA-5113:
-------------------------------------------
Summary: RepairCallback breaks CL guarantees
Key: CASSANDRA-5113
URL: https://issues.apache.org/jira/browse/CASSANDRA-5113
Project: Cassandra
Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Fix For: 1.1.9
Attachments: 0001-Always-ensure-CL-after-a-digest-mismatch.txt,
0002-Rename-RowRepairResolver-to-RowDataResolver.txt
RepairCallback does not validate the consistency level of the query. It seems
that this was done on purpose as the comments there says:
{noformat}
/**
* The main difference between this and ReadCallback is, ReadCallback has a
ConsistencyLevel
* it needs to achieve. Repair on the other hand is happy to repair whoever
replies within the timeout.
*/
{noformat}
Concretely, the get() method of RepairCallback:
* waits for all endpoints, even if there is more than strictly required by the
CL.
* if it timeouts, doesn't check it and always return a response.
* for some reason, it returns null unless there is strictly more than 1
response.
All of that seems wrong to me. The result of RepairCallback is what is returned
to the client in case of a digest mismatch on the first read. So we must ensure
that the CL has been reached. Also, returning null where there is 1 response
(or none) seems clearly wrong.
In fact I don't think we need a special callback for this "read all data" phase
as it is a "normal" read (the fact we do a first read with digests is just an
"optimization"). The only difference between the two phases should be how we
resolve the responses (in the first case we have digest and in the 2nd we
don't) but that's handled by the resolver.
So attaching a patch that removes RepairCallback and use ReadCallback instead.
I'm also attaching a 2nd trivial patch that renames RowRepairResolver to
RowDataResolver because I think it describe better what this actually do (i.e.
the main goal is to resolve a full data read to answer the right value to the
client, repairing inconsistent nodes is secondary).
The patch is against 1.1, because I think breaking CL guarantees is probably
serious enough to warrant pushing this to 1.1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira