[
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Snyder updated CASSANDRA-15439:
------------------------------------
Description:
In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the
bootstrapping node after RING_DELAY, since it will evicted from the TMD pending
ranges. Should we create a ticket to address this?"
CASSANDRA-15264 relates to the most likely cause of such situations, where the
Cassandra daemon on the bootstrapping node completely crashes. Based on testing
with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it also is
possible to remove token metadata (and thus pending ranges, and thus hints) for
a bootstrapping node, simply by affecting its status in the failure detector.
A node in the cluster sees the bootstrapping node this way:
{noformat}
INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node
/PUBLIC-IP is now part of the cluster
INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - InetAddress
/PUBLIC-IP is now UP
INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412
OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019
StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
ID#0] Creating new streaming plan for Bootstrap
INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020
StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
ID#0] Received streaming plan for Bootstrap
INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112
StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
ID#0] Received streaming plan for Bootstrap
INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179
StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833
files(139744616815 bytes)
INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - InetAddress
/PUBLIC-IP is now DOWN
INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient
/PUBLIC-IP has been silent for 30000ms, removing from gossip
{noformat}
Since the bootstrapping node has no tokens, it is treated like a fat client,
and it is removed from the ring. For correctness purposes, I believe we must
keep storing hints for the downed bootstrapping node until it is either
assassinated or until a replacement attempts to bootstrap for the same token.
was:
In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the
bootstrapping node after RING_DELAY, since it will evicted from the TMD pending
ranges. Should we create a ticket to address this?"
CASSANDRA-15264 relates to the most likely cause of such situations, where the
Cassandra daemon on the bootstrapping node completely crashes. Based on testing
with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it is possible
to remove token metadata (and thus pending ranges, and thus hints) for a
bootstrapping node, simply by affecting its status in the failure detector.
A node in the cluster sees the bootstrapping node this way:
{noformat}
INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node
/PUBLIC-IP is now part of the cluster
INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - InetAddress
/PUBLIC-IP is now UP
INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412
OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019
StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
ID#0] Creating new streaming plan for Bootstrap
INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020
StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
ID#0] Received streaming plan for Bootstrap
INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112
StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
ID#0] Received streaming plan for Bootstrap
INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179
StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833
files(139744616815 bytes)
INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - InetAddress
/PUBLIC-IP is now DOWN
INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient
/PUBLIC-IP has been silent for 30000ms, removing from gossip
{noformat}
Since the bootstrapping node has no tokens, it is treated like a fat client,
and it is removed from the ring. For correctness purposes, I believe we must
keep storing hints for the downed bootstrapping node until it is either
assassinated or until a replacement attempts to bootstrap for the same token.
> Hints for bootstrapping nodes are dropped under temporary failures
> ------------------------------------------------------------------
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
> Issue Type: Bug
> Reporter: Josh Snyder
> Priority: Normal
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the
> bootstrapping node after RING_DELAY, since it will evicted from the TMD
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where
> the Cassandra daemon on the bootstrapping node completely crashes. Based on
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it
> also is possible to remove token metadata (and thus pending ranges, and thus
> hints) for a bootstrapping node, simply by affecting its status in the
> failure detector.
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node
> /PUBLIC-IP is now part of the cluster
> INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 -
> InetAddress /PUBLIC-IP is now UP
> INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
> ID#0] Creating new streaming plan for Bootstrap
> INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
> ID#0] Received streaming plan for Bootstrap
> INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4,
> ID#0] Received streaming plan for Bootstrap
> INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833
> files(139744616815 bytes)
> INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 -
> InetAddress /PUBLIC-IP is now DOWN
> INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient
> /PUBLIC-IP has been silent for 30000ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client,
> and it is removed from the ring. For correctness purposes, I believe we must
> keep storing hints for the downed bootstrapping node until it is either
> assassinated or until a replacement attempts to bootstrap for the same token.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]