[ https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839851#comment-17839851 ]
Raymond Huffman commented on CASSANDRA-15439: --------------------------------------------- Just bumped this ticket in Slack because I believe this issue still exists in 4.1 As an alternative to the patch linked here, could we instead check if a node is bootstrapping with something like this? {code} public boolean isJoining(InetAddress endpoint) { assert endpoint != null; publicLock.readLock().lock(); lock.readLock().lock(); try { return bootstrapTokens.inverse().containsKey(endpoint); } finally { lock.readLock().unlock(); publicLock.readLock().unlock(); } } {code} > Token metadata for bootstrapping nodes is lost under temporary failures > ----------------------------------------------------------------------- > > Key: CASSANDRA-15439 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15439 > Project: Cassandra > Issue Type: Bug > Reporter: Josh Snyder > Priority: Normal > > In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the > bootstrapping node after RING_DELAY, since it will evicted from the TMD > pending ranges. Should we create a ticket to address this?" > CASSANDRA-15264 relates to the most likely cause of such situations, where > the Cassandra daemon on the bootstrapping node completely crashes. Based on > testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it > also is possible to remove token metadata (and thus pending ranges, and thus > hints) for a bootstrapping node, simply by affecting its status in the > failure detector. > A node in the cluster sees the bootstrapping node this way: > {noformat} > INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node > /PUBLIC-IP is now part of the cluster > INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - > InetAddress /PUBLIC-IP is now UP > INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 > OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 > StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Creating new streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 > StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 > files(139744616815 bytes) > INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - > InetAddress /PUBLIC-IP is now DOWN > INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient > /PUBLIC-IP has been silent for 30000ms, removing from gossip > {noformat} > Since the bootstrapping node has no tokens, it is treated like a fat client, > and it is removed from the ring. For correctness purposes, I believe we must > keep storing hints for the downed bootstrapping node until it is either > assassinated or until a replacement attempts to bootstrap for the same token. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org