[ 
https://issues.apache.org/jira/browse/CASSANDRA-12991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728633#comment-15728633
 ] 

Jeremiah Jordan commented on CASSANDRA-12991:
---------------------------------------------

So I have thought about this before and decided the complexity of trying to 
improve this wasn't worth the streaming saved. The consequences are that some 
number of in flight partitions are going to have their unrepaired parts 
streamed around.

One idea I had for this was to have repair agree to a time X seconds in the 
future to do the flush, and to flush only data with time stamps less than X / 2 
seconds, or to split the flush into data before that and data after that. But 
that seemed like a lot of extra complexity to flushing and repair for what I 
figured wasn't much savings.

> Inter-node race condition in validation compaction
> --------------------------------------------------
>
>                 Key: CASSANDRA-12991
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12991
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Roth
>            Priority: Minor
>
> Problem:
> When a validation compaction is triggered by a repair it may happen that due 
> to flying in mutations the merkle trees differ but the data is consistent 
> however.
> Example:
> t = 10000: 
> Repair starts, triggers validations
> Node A starts validation
> t = 10001:
> Mutation arrives at Node A
> t = 10002:
> Mutation arrives at Node B
> t = 10003:
> Node B starts validation
> Hashes of node A+B will differ but data is consistent from a view (think of 
> it like a snapshot) t = 10000.
> Impact:
> Unnecessary streaming happens. This may not a big impact on low traffic CFs, 
> partitions but on high traffic CFs and maybe very big partitions, this may 
> have a bigger impact and is a waste of resources.
> Possible solution:
> Build hashes based upon a snapshot timestamp.
> This requires SSTables created after that timestamp to be filtered when doing 
> a validation compaction:
> - Cells with timestamp > snapshot time have to be removed
> - Tombstone range markers have to be handled
>  - Bounds have to be removed if delete timestamp > snapshot time
>  - Boundary markers have to be either changed to a bound or completely 
> removed, depending if start and/or end are both affected or not
> Probably this is a known behaviour. Have there been any discussions about 
> this in the past? Did not find an matching issue, so I created this one.
> I am happy about any feedback, whatsoever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to