[
https://issues.apache.org/jira/browse/CASSANDRA-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357465#comment-17357465
]
Samuel Klock edited comment on CASSANDRA-16710 at 6/4/21, 4:08 PM:
-------------------------------------------------------------------
The repro steps in the description yield ('b', 2) for the final read for
{{VERSION=2.1.22}} and {{VERSION=2.2.19}}. So 2.x versions don't appear to be
affected (unless the full row was synced to node3 via a channel the steps don't
account for).
was (Author: sklock):
The repro steps in the description yield ('b', 2) for the final read for
{{VERSION=2.1.22}} and {{VERSION=2.2.19}}. So 2.x versions don't appear to be
affected.
> Read repairs can break row isolation
> ------------------------------------
>
> Key: CASSANDRA-16710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16710
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Samuel Klock
> Priority: Urgent
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> This issue essentially revives CASSANDRA-8287, was resolved "Later" in 2015.
> While it was possible in principle at that time for read repair to break row
> isolation, that couldn't happen in practice because Cassandra always pulled
> all of the columns for each row in response to regular reads, so read repairs
> would never partially resolve a row. CASSANDRA-10657 modified Cassandra to
> only pull the requested columns for reads, which enabled read repair to break
> row isolation in practice.
> Note also that this is distinct from CASSANDRA-14593 (for read repair
> breaking partition-level isolation): that issue (as we understand it)
> captures isolation being broken across multiple rows within an update to a
> partition, while this issue covers broken isolation across multiple columns
> within an update to a single row.
> This behavior is easy to reproduce under affected versions using {{ccm}}:
> {code:bash}
> ccm create -n 3 -v $VERSION rrtest
> ccm updateconf -y 'hinted_handoff_enabled: false
> max_hint_window_in_ms: 0'
> ccm start
> (cat <<EOF
> CREATE KEYSPACE IF NOT EXISTS rrtest WITH REPLICATION = {'class':
> 'SimpleStrategy', 'replication_factor': '3'};
> CREATE TABLE IF NOT EXISTS rrtest.kv (key TEXT PRIMARY KEY, col1 TEXT, col2
> INT);
> CONSISTENCY ALL;
> INSERT INTO rrtest.kv (key, col1, col2) VALUES ('key', 'a', 1);
> EOF
> ) | ccm node1 cqlsh
> ccm node3 stop
> (cat <<EOF
> CONSISTENCY QUORUM;
> INSERT INTO rrtest.kv (key, col1, col2) VALUES ('key', 'b', 2);
> EOF
> ) | ccm node1 cqlsh
> ccm node3 start
> ccm node2 stop
> (cat <<EOF
> CONSISTENCY QUORUM;
> SELECT key, col1 FROM rrtest.kv WHERE key = 'key';
> EOF
> ) | ccm node1 cqlsh
> ccm node1 stop
> (cat <<EOF
> CONSISTENCY ONE;
> SELECT * FROM rrtest.kv WHERE key = 'key';
> EOF
> ) | ccm node3 cqlsh
> {code}
> This snippet creates a three-node cluster with an RF=3 keyspace containing a
> table with three columns: a partition key and two value columns. (Hinted
> handoff can mask the problem if the repro steps are executed in quick
> succession, so the snippet disables it for this exercise.) Then:
> # It adds a full row to the table with values ('a', 1), ensuring it's
> replicated to all three nodes.
> # It stops a node, then replaces the initial row with new values ('b', 2) in
> a single update, ensuring that it's replicated to both available nodes.
> # It starts the node that was down, then stops one of the other nodes and
> performs a quorum read just for the letter column. The read observes 'b'.
> # Finally, it stops the other node that observed the second update, then
> performs a CL=ONE read of the entire row on the node that was down for that
> update.
> If read repair respects row isolation, then the final read should observe
> ('b', 2). (('a', 1) is also acceptable if we're willing to sacrifice
> monotonicity.)
> * With {{VERSION=3.0.24}}, the final read observes ('b', 2), as expected.
> * With {{VERSION=3.11.10}} and {{VERSION=4.0-rc1}}, the final read instead
> observes ('b', 1). The same is true for 3.0.24 if CASSANDRA-10657 is
> backported to it.
> The scenario above is somewhat contrived in that it supposes multiple read
> workflows consulting different sets of columns with different consistency
> levels. Under 3.11, asynchronous read repair makes this scenario possible
> even using just CL=ONE -- and with speculative retry, even if
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are both zeroed. We
> haven't looked closely at 4.0, but even though (as we understand it) it lacks
> async read repair, scenarios like CL=ONE writes or failed,
> partially-committed CL>ONE writes create some surface area for this behavior,
> even without mixed consistency/column reads.
> Given the importance of paging to reads from wide partitions, it makes some
> intuitive sense that applications shouldn't rely on isolation at the
> partition level. Being unable to rely on row isolation is much more
> surprising, especially given that (modulo the possibility of other atomicity
> bugs) Cassandra did preserve it before 3.11. Cassandra should either find a
> solution for this in code (e.g., when performing a read repair, always
> operate over all of the columns for the table, regardless of what was
> originally requested for a read) or at least update its documentation to
> include appropriate caveats about update isolation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]