[ 
https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505783#comment-14505783
 ] 

sankalp kohli commented on CASSANDRA-7168:
------------------------------------------

cc [~krummas]
I agree with [~slebresne] that we first need to make sure last repair time is 
consistent across replicas CASSANDRA-9143. 
There is lot of overlap here between this ticket and CASSANDRA-6434 but I chose 
this ticket to comment since there is lot of discussions here :). 
CASSANDRA-6434 will only drop tombstones from the repaired data. The problem 
with this is that if repair time could not be sent to one replica with 
CASSANDRA-9143, it will not drop tombstone for the data which other replicas 
will. 
Now during a normal read or repair consistency read, this replica which did not 
get the repair time will include some tombstones which other replicas won't. 
This is due to different view of what is repaired and what is not. This will 
cause digest mismatch leading to spike in latency. 

We also cannot use Benedict approach of finding the last common repair time 
since replicas which are ahead would have compacted there tombstones leading to 
the same problem of digest mismatch.

I think we need to do  CASSANDRA-9143 and also only drop tombstones when we are 
sure all replicas has that repair time. 

Also during the time when replicas are getting the message that these set of 
stables are repaired and don't include the tombstones from them in read and 
start dropping tombstones if eligible, this is not going to be done at same 
time across replicas. This will cause digest mismatch during this time which is 
not ideal. 

I have not yet thought through how this could be avoided. 



> Add repair aware consistency levels
> -----------------------------------
>
>                 Key: CASSANDRA-7168
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7168
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: performance
>             Fix For: 3.1
>
>
> With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to 
> avoid a lot of extra disk I/O when running queries with higher consistency 
> levels.  
> Since repaired data is by definition consistent and we know which sstables 
> are repaired, we can optimize the read path by having a REPAIRED_QUORUM which 
> breaks reads into two phases:
>  
>   1) Read from one replica the result from the repaired sstables. 
>   2) Read from a quorum only the un-repaired data.
> For the node performing 1) we can pipeline the call so it's a single hop.
> In the long run (assuming data is repaired regularly) we will end up with 
> much closer to CL.ONE performance while maintaining consistency.
> Some things to figure out:
>   - If repairs fail on some nodes we can have a situation where we don't have 
> a consistent repaired state across the replicas.  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to