[
https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080852#comment-17080852
]
Ekaterina Dimitrova edited comment on CASSANDRA-15601 at 4/10/20, 6:47 PM:
---------------------------------------------------------------------------
Hi [~aleksey],
I see you are assigned as a reviewer, are you looking into this one? Need help
if you are busy?
was (Author: e.dimitrova):
Hi [~aleksey],
I see you are assigned as a reviewer, are you looking into this one?
> Ensure repaired data tracking reads a consistent amount of data across
> replicas
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-15601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15601
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Repair
> Reporter: Sam Tunnicliffe
> Assignee: Sam Tunnicliffe
> Priority: Normal
> Fix For: 4.0-alpha
>
>
> When generating a digest for repaired data tracking, the amount of repaired
> data that needs to be read may depend on the unrepaired data on the replica.
> As this may vary between replicas, digest mismatches can be reported even
> though the repaired data may actually be in sync.
> For example, two replicas, A & B and a table like
> {code}
> CREATE TABLE t (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER
> BY ck DESC;
> Unrepaired
> ===========
> Instance A
> (0, 5)
> Instance B
> (0, 6)
> (0, 5)
> Repaired (Both A & B)
> =========
> (0, 4)
> (0, 3)
> (0, 2)
> (0, 1)
> (0, 0)
> SELECT * FROM tbl WHERE pk = 0 LIMIT 3;
> {code}
> Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from
> the repaired set.
> Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4)
> from repaired data.
> Unrepaired row/range/partition tombstones shadowing repaired data and present
> on some replicas but not others will have the opposite effect, with more
> repaired data being read in comparison.
> To fix this, when repaired data tracking is in effect each replica needs to
> overread during a full data read. Replicas should read up to {{LIMIT}} (i.e.
> the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless
> of how much is read from the unrepaired data. At the point where that amount
> of repaired data has been read, replica should stop updating the digest. So
> if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read,
> the digest is only calculated over the first {{LIMIT}}-worth of repaired data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]