Nadav Har'El created CASSANDRA-14262:
----------------------------------------
Summary: View update sent multiple times during range movement
Key: CASSANDRA-14262
URL: https://issues.apache.org/jira/browse/CASSANDRA-14262
Project: Cassandra
Issue Type: Improvement
Components: Materialized Views
Reporter: Nadav Har'El
This issue is about updating a base table with materialized views while
token-ranges are being moved, i.e., while a node is being added or removed from
the cluster (this is a long process because the data needs to be streamed to
its new owning node).
During this process, each view-mutation we want to write to a view table may
have an additional "pending node" (or several of them) - another node (or
nodes) which will hold this view mutation, and we need to send the view
mutations to these new nodes too. This code existed until CASSANDRA-13069, when
it was accidentally removed, and returned in CASSANDRA-14251.
However, the current code, in mutateMV(), has each of the RF (e.g., 3) base
replicas send the view mutation to the the same pending node. This is of course
redundant, and reduces write throughput while the streaming is performed.
I suggested (based on an idea by [~shlomi_livne]) that it may be enough for
only the single node which will be paired (when the range movement completes)
with the pending node to send it the update. [~pauloricardomg] replied (see
[https://lists.apache.org/thread.html/12c78582a3f709ca33a45e5fa6121148b1b1ad9c9b290d1a21e4409b@%3Cdev.cassandra.apache.org%3E]
) that it appears that such an optimization would work in the common case of
single movements but will not work in rarer more complex cases (I did not fully
understand the details, check out the above link for the details).
I believe there's another problem with the current code, which is of
correctness: If any view replica ends up with two different view rows for the
same partition key, such a mistake cannot currently be fixed (see
CASSANDRA-10346). But if we have different base replicas with two different
values (a consistency an ordinary base repair could fix, if we ran it) and both
of them send their update to the same pending view replica, this view replica
will now have two rows, one of them wrong (and cannot currently be repaired).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]