[ https://issues.apache.org/jira/browse/CASSANDRA-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277213#comment-17277213 ]
Andres de la Peña commented on CASSANDRA-16296: ----------------------------------------------- The fix (and test) for the problem when joining a new node looks good to me, +1. Regarding the client warning, I'm finding that it sticks for the session, so every {{CREATE KEYSPACE}} statement will emit all the previous warnings: {code:java} CREATE KEYSPACE k1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; # warn about k1 CREATE KEYSPACE k2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};; # warn about k1 again {code} This is more surprising when we try to fix the problem reducing the RF: {code:java} CREATE KEYSPACE k WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; # warn about k1 ALTER KEYSPACE k WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; # warn about k1 again {code} As for the warning message in logs, I see that when we create a keyspace with a RF lesser than the number of nodes the warning will be emitted twice in every node. Also, every schema altering statement will emit the warning again in every node: {code:java} CREATE KEYSPACE k WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; # 2 warnings CREATE TABLE k.t (k int PRIMARY KEY, v int); # 1 warning ALTER TABLE k.t ADD v2 int; # 1 warning {code} This behaviour can be seen in the provided dtest, [here|https://github.com/apache/cassandra-dtest/blob/d9287666b80864165e48a37bb76640c16f084dc7/bootstrap_test.py#L301-L302]. I don't think this is wrong and indeed is the common behaviour for keyspace properties validation. However, I understand that having a RF lesser than the number of nodes is legitimate and the warning is just a friendly advice to prevent accidents, so perhaps we might consider warning only once, at keyspace creation time. I think we could do that by moving the warning emission from {{AbstractReplicationStrategy#validateOptions}} to a new separate abstract method in {{AbstractReplicationStrategy}}, and calling that method from {{AbstractReplicationStrategy.validateReplicationStrategy}}, but not from {{AbstractReplicationStrategy#createReplicationStrategy}}, if that makes sense. wdyt? Finally, the new client warning can also be emitted by {{ALTER KEYSPACE}}, we could add a test to verify it. > Join new node to cluster failing on C* version 4 > ------------------------------------------------ > > Key: CASSANDRA-16296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16296 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown > Reporter: Yakir Gibraltar > Assignee: Berenguer Blasi > Priority: Normal > Fix For: 4.0-beta > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Hi, i'm using cassandra 4.0-beta3, > Join new node failing, > Current status: > {code} > [root@rc801 ~]# nodetool status > Datacenter: V4CH > ================ > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 1.1.1.1 356.55 KiB 128 ? > 92ae4c39-edb3-4e67-8623-b49fd8301b66 RAC1 > UN 2.2.2.2 424.36 KiB 128 ? > d80647a8-32b2-4a8f-8022-f5ae3ce8fbb2 RAC1 > Datacenter: V4NJ > ================ > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 3.3.3.3 322.97 KiB 128 ? > 0106ce0b-18be-4321-8247-843076ff42e6 RAC1 > UN 4.4.4.4 297.6 KiB 128 ? > d81e2b7d-9221-4b1b-86c8-e8956e3eec07 RAC1 > UN 5.5.5.5 324.75 KiB 128 ? > 5dfbaf14-8310-4d64-b61a-9da0a7a8a620 RAC1 > {code} > Error on new host (7.7.7.7): > {code} > INFO [main] 2020-11-22 09:27:36,592 RangeStreamer.java:323 - Bootstrap: > range Full(/7.7.7.7:7000,(-7734271291904743823,-7725025391901227341]) exists > on Full(/1.1.1.1:7000,(-7734271291904743823,-7725025391901227341]) for > keyspace system_traces > ERROR [main] 2020-11-22 09:27:36,609 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Multiple strict sources found for > Full(/7.7.7.7:7000,(9208454697546654053,-9212763148842799409]), sources: > [Full(/2.2.2.2:7000,(9189373804905478622,-9212763148842799409]), > Full(/1.1.1.1:7000,(9189373804905478622,-9212763148842799409])] > at > org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:517) > at > org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:383) > at > org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:320) > at > org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:81) > at > org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1635) > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1612) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795) > INFO [StorageServiceShutdownHook] 2020-11-22 09:27:36,616 > HintsService.java:220 - Paused hints dispatch > WARN [StorageServiceShutdownHook] 2020-11-22 09:27:36,617 Gossiper.java:1849 > - No local state, state is in silent shutdown, or node hasn't joined, not > announcing shutdown > INFO [StorageServiceShutdownHook] 2020-11-22 09:27:36,617 > MessagingService.java:441 - Waiting for messaging service to quiesce > {code} > Errors on seed node: > {code} > INFO [Messaging-EventLoop-3-37] 2020-11-22 09:26:54,121 > InboundConnectionInitiator.java:459 - > /7.7.7.7:7000(/7.7.7.7:45490)->/1.1.1.1:7000-URGENT_MESSAGES-f412b9ba > messaging connection established, version = 12, framing = LZ4, encryption = > disabled > INFO [Messaging-EventLoop-3-40] 2020-11-22 09:26:54,157 > OutboundConnection.java:1151 - > /1.1.1.1:7000(/1.1.1.1:38678)->/7.7.7.7:7000-URGENT_MESSAGES-3b23f639 > successfully connected, version = 12, framing = LZ4, encryption = disabled > INFO [GossipStage:1] 2020-11-22 09:26:56,109 Gossiper.java:1248 - Node > /7.7.7.7:7000 is now part of the cluster > INFO [GossipStage:1] 2020-11-22 09:26:56,147 Gossiper.java:1206 - > InetAddress /7.7.7.7:7000 is now UP > INFO [Messaging-EventLoop-3-1] 2020-11-22 09:26:57,378 > InboundConnectionInitiator.java:459 - > /7.7.7.7:7000(/7.7.7.7:45504)->/1.1.1.1:7000-SMALL_MESSAGES-77c813d4 > messaging connection established, version = 12, framing = LZ4, encryption = > disabled > INFO [Messaging-EventLoop-3-38] 2020-11-22 09:26:57,383 > OutboundConnection.java:1151 - > /1.1.1.1:7000(/1.1.1.1:38680)->/7.7.7.7:7000-SMALL_MESSAGES-e75eaaa2 > successfully connected, version = 12, framing = LZ4, encryption = disabled > INFO [Messaging-EventLoop-3-2] 2020-11-22 09:27:36,623 > InboundConnectionInitiator.java:459 - > /7.7.7.7:7000(/7.7.7.7:45534)->/1.1.1.1:7000-LARGE_MESSAGES-defc02dd > messaging connection established, version = 12, framing = LZ4, encryption = > disabled > INFO [Messaging-EventLoop-3-40] 2020-11-22 09:27:37,151 NoSpamLogger.java:92 > - /1.1.1.1:7000->/7.7.7.7:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /7.7.7.7:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529) > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > INFO [GossipStage:1] 2020-11-22 09:27:57,154 Gossiper.java:1224 - > InetAddress /7.7.7.7:7000 is now DOWN > {code} > Looks that calculateRangesToFetchWithPreferredEndpoints added as part of > "Transient Replication" > https://github.com/apache/cassandra/commit/f7431b432875e334170ccdb19934d05545d2cebd > It's new fresh cluster without data and the join failing after few seconds. > Thank you, > Yakir Gibraltar. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org