[ 
https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894645#comment-15894645
 ] 

Benjamin Roth commented on CASSANDRA-12888:
-------------------------------------------

For detailed explanation an excerpt from that discussion:
-----

... there are still possible scenarios where it's possible to break consistency 
by repairing the base and the view separately even with QUORUM writes:

Initial state:

Base replica A: {k0=v0, ts=0}
Base replica B: {k0=v0, ts=0}
Base replica C: {k0=v0, ts=0}
View paired replica A: {v1=k0, ts=0}
View paired replica B: {v0=k0, ts=0}
View paired replica C: {v0=k0, ts=0}

Base replica A receives write {k1=v1, ts=1}, propagates to view paired replica 
A and dies.

Current state is:
Base replica A: {k1=v1, ts=1}
Base replica B: {k0=v0, ts=0}
Base replica C: {k0=v0, ts=0}
View paired replica A: {v1=k1, ts=1}
View paired replica B: {v0=k0, ts=0}
View paired replica C: {v0=k0, ts=0}

Base replica B and C receives write {k2=v2, ts=2}, write to their paired 
replica. Write is successful at QUORUM.

Current state is:
Base replica A: {k1=v1, ts=1}
Base replica B: {k2=v2, ts=2}
Base replica C: {k2=v2, ts=2}
View paired replica A: {v1=k1, ts=1}
View paired replica B: {v2=k2, ts=2}
View paired replica C: {v2=k2, ts=2}

A returns from the dead. Repair base table:
Base replica A: {k2=v2, ts=2}
Base replica B: {k2=v2, ts=2}
Base replica C: {k2=v2, ts=2}

Repair MV:
View paired replica A: {v1=k1, ts=1} and {v2=k2, ts=2}
View paired replica B: {v1=k1, ts=1} and {v2=k2, ts=2}
View paired replica C: {v1=k1, ts=1} and {v2=k2, ts=2}

So, this requires replica A to generate a tombstone for {v1=k1, ts=1} during 
repair of base table.

> Incremental repairs broken for MVs and CDC
> ------------------------------------------
>
>                 Key: CASSANDRA-12888
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12888
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Stefan Podkowinski
>            Assignee: Benjamin Roth
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x
>
>
> SSTables streamed during the repair process will first be written locally and 
> afterwards either simply added to the pool of existing sstables or, in case 
> of existing MVs or active CDC, replayed on mutation basis:
> As described in {{StreamReceiveTask.OnCompletionRunnable}}:
> {quote}
> We have a special path for views and for CDC.
> For views, since the view requires cleaning up any pre-existing state, we 
> must put all partitions through the same write path as normal mutations. This 
> also ensures any 2is are also updated.
> For CDC-enabled tables, we want to ensure that the mutations are run through 
> the CommitLog so they can be archived by the CDC process on discard.
> {quote}
> Using the regular write path turns out to be an issue for incremental 
> repairs, as we loose the {{repaired_at}} state in the process. Eventually the 
> streamed rows will end up in the unrepaired set, in contrast to the rows on 
> the sender site moved to the repaired set. The next repair run will stream 
> the same data back again, causing rows to bounce on and on between nodes on 
> each repair.
> See linked dtest on steps to reproduce. An example for reproducing this 
> manually using ccm can be found 
> [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to