[
https://issues.apache.org/jira/browse/CASSANDRA-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784430#comment-17784430
]
Szymon Miezal edited comment on CASSANDRA-18824 at 11/17/23 6:27 PM:
---------------------------------------------------------------------
Going back to it after a while, I have prepared new branches, 3.1 - 3.11
commits contain backport plus a test adjustment, 4.0 - trunk commits contain
only the test adjustment which makes it independent from the other test.
[https://github.com/szymon-miezal/cassandra/commit/cf5fe235948dc08ff00a63d264a2014829365e56]
(3.0)
[https://github.com/szymon-miezal/cassandra/commit/7073b03fbf34d5a8ec7b1d3460191b4eaa7bcbf3]
(3.11)
[https://github.com/szymon-miezal/cassandra/commit/7c53a56ca208c880812f9a364bdd0a09c584168f]
(4.0)
[https://github.com/szymon-miezal/cassandra/commit/a186a10ad52c8cffcf172bfaab776195f6ea7d13]
(4.1)
[https://github.com/szymon-miezal/cassandra/commit/50f59dfd69245472e63579766e6b4d9185fa6965|https://github.com/szymon-miezal/cassandra/commit/017997b7778cfd4381477bed5c4df1aa16ef1cab]
(5.0)
[https://github.com/szymon-miezal/cassandra/commit/984c0527272251223c60efde2126dd8e06a22d68]
(trunk)
I haven't modified _CHANGES.txt_ file as IIUC it should be done during merging.
So far I have run tests for
[3.0|https://app.circleci.com/pipelines/github/szymon-miezal/cassandra?branch=CASSANDRA-18824-3.0]
and
[3.11|https://app.circleci.com/pipelines/github/szymon-miezal/cassandra?branch=CASSANDRA-18824-3.11]
patches but it seems CircleCI free tier is not suitable for running
distributed tests.
was (Author: JIRAUSER302037):
Going back to it after a while, I have prepared new branches, 3.1 - 3.11
commits contain backport plus a test adjustment, 4.0 - trunk commits contain
only the test adjustment which makes it independent from the other test.
[https://github.com/szymon-miezal/cassandra/commit/a56ad6916e85a6956ec71fd3d85ed685c053d087]
(3.0)
[https://github.com/szymon-miezal/cassandra/commit/456863025285c76f37d1e219e3688aaee33f0269]
(3.11)
[https://github.com/szymon-miezal/cassandra/commit/7d4b2b4e9b2648ad1948650a803a221fe395a61c]
(4.0)
[https://github.com/szymon-miezal/cassandra/commit/cc704cd3654e6f5db1ba6f3fa4c7e8e49a51a783]
(4.1)
[https://github.com/szymon-miezal/cassandra/commit/017997b7778cfd4381477bed5c4df1aa16ef1cab]
(5.0)
[https://github.com/szymon-miezal/cassandra/commit/035d8fa5af05467f5e519731a4d45d74b5d4738]
(trunk)
I haven't modified _CHANGES.txt_ file as IIUC it should be done during merging.
> Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused
> missing replica
> -------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18824
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Bootstrap and Decommission
> Reporter: Szymon Miezal
> Assignee: Szymon Miezal
> Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> Node decommission triggers data transfer to other nodes. While this transfer
> is in progress,
> receiving nodes temporarily hold token ranges in a pending state. However,
> the cleanup process currently doesn't consider these pending ranges when
> calculating token ownership.
> As a consequence, data that is already stored in sstables gets inadvertently
> cleaned up.
> STR:
> * Create two node cluster
> * Create keyspace with RF=1
> * Insert sample data (assert data is available when querying both nodes)
> * Start decommission process of node 1
> * Start running cleanup in a loop on node 2 until decommission on node 1
> finishes
> * Verify of all rows are in the cluster - it will fail as the previous step
> removed some of the rows
> It seems that the cleanup process does not take into account the pending
> ranges, it uses only the local ranges -
> [https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466].
> There are two solutions to the problem.
> One would be to change the cleanup process in a way that it start taking
> pending ranges into account. Even thought it might sound tempting at first it
> will require involving changes and a lot of testing effort.
> Alternatively we could interrupt/prevent the cleanup process from running
> when any pending range on a node is detected. That sounds like a reasonable
> alternative to the problem and something that is relatively easy to implement.
> The bug has been already fixed in 4.x with CASSANDRA-16418, the goal of this
> ticket is to backport it to 3.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]