Dmitry Konstantinov created CASSANDRA-19879: -----------------------------------------------
Summary: distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest fails sometimes Key: CASSANDRA-19879 URL: https://issues.apache.org/jira/browse/CASSANDRA-19879 Project: Cassandra Issue Type: Bug Components: Consistency/Bootstrap and Decommission Reporter: Dmitry Konstantinov org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest JUnit test may fail rarely with NPE: {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)" because "state" is null at org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245) at org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156) at org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:840){code} Observed during testing of CASSANDRA-19651 It is not reproduced easily. As a part of Instance.startup org.apache.cassandra.gms.Gossiper#waitToSettle waits for 5 +3 x 1 = 8 seconds if there are no changes in the number of nodes discovered using gossip (even if we have not had any interactions with other nodes using gossip at all). I have added a 5-second sleep to org.apache.cassandra.gms.Gossiper.GossipTask#run (we also have 1 second of initial delay when we schedule GossipTask) {code} private class GossipTask implements Runnable { public void run() { try { //wait on messaging service to start listening MessagingService.instance().waitUntilListening(); Thread.sleep(5000); // <=============================== taskLock.lock(); {code} and have got the NPE reproduced more frequently. So, it looks like the test may fail if by some reason GossipTask haven't had a chance to run before EndpointState.getApplicationState is invoked as a part of the test logic. Note: In 5.1 the test is different and does not have pullSchemaFrom logic at all. A conversion about the issue was started in CASSANDRA-19651 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org