[ 
https://issues.apache.org/jira/browse/CASSANDRA-19879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Konstantinov updated CASSANDRA-19879:
--------------------------------------------
    Description: 
org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest
 JUnit test may fail rarely with NPE:
{code:java}
 java.lang.NullPointerException: Cannot invoke 
"org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)"
 because "state" is null

        at 
org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245)
        at 
org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156)
        at 
org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124)
        at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
        at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840){code}
Observed during testing of CASSANDRA-19651
It is not reproduced easily.
As a part of Instance.startup org.apache.cassandra.gms.Gossiper#waitToSettle 
waits for 5 +3 x 1 = 8 seconds if there are no changes in the number of nodes 
discovered using gossip (even if we have not had any interactions with other 
nodes using gossip at all).
I have added a 5-second sleep to 
org.apache.cassandra.gms.Gossiper.GossipTask#run (we also have 1 second of 
initial delay when we schedule GossipTask)
{code:java}
 private class GossipTask implements Runnable
    {
        public void run()
        {
            try
            {
                //wait on messaging service to start listening
                MessagingService.instance().waitUntilListening();
                Thread.sleep(5000); // <===============================

                taskLock.lock();
{code}
and have got the NPE reproduced more frequently. 
So, it looks like the test may fail if by some reason GossipTask haven't had a 
chance to run before EndpointState.getApplicationState is invoked as a part of 
the test logic.

Note: In 5.1 the test is different and does not have pullSchemaFrom logic at 
all.
A conversation about the issue was started in CASSANDRA-19651

  was:
org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest
 JUnit test may fail rarely with NPE:
{code:java}
 java.lang.NullPointerException: Cannot invoke 
"org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)"
 because "state" is null

        at 
org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245)
        at 
org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156)
        at 
org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124)
        at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
        at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840){code}

Observed during testing of CASSANDRA-19651
It is not reproduced easily.
As a part of Instance.startup org.apache.cassandra.gms.Gossiper#waitToSettle 
waits for 5 +3 x 1 = 8 seconds if there are no changes in the number of nodes 
discovered using gossip (even if we have not had any interactions with other 
nodes using gossip at all).
I have added a 5-second sleep to 
org.apache.cassandra.gms.Gossiper.GossipTask#run (we also have 1 second of 
initial delay when we schedule GossipTask)
{code}
 private class GossipTask implements Runnable
    {
        public void run()
        {
            try
            {
                //wait on messaging service to start listening
                MessagingService.instance().waitUntilListening();
                Thread.sleep(5000); // <===============================

                taskLock.lock();
{code}
and have got the NPE reproduced more frequently. 
So, it looks like the test may fail if by some reason GossipTask haven't had a 
chance to run before EndpointState.getApplicationState is invoked as a part of 
the test logic.

Note: In 5.1 the test is different and does not have pullSchemaFrom logic at 
all.
A conversion about the issue was started in CASSANDRA-19651



> distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest fails 
> sometimes
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19879
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19879
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Bootstrap and Decommission
>            Reporter: Dmitry Konstantinov
>            Priority: Low
>             Fix For: 5.0.x
>
>
> org.apache.cassandra.distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest
>  JUnit test may fail rarely with NPE:
> {code:java}
>  java.lang.NullPointerException: Cannot invoke 
> "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)"
>  because "state" is null
>       at 
> org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245)
>       at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156)
>       at 
> org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124)
>       at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>       at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.base/java.lang.Thread.run(Thread.java:840){code}
> Observed during testing of CASSANDRA-19651
> It is not reproduced easily.
> As a part of Instance.startup org.apache.cassandra.gms.Gossiper#waitToSettle 
> waits for 5 +3 x 1 = 8 seconds if there are no changes in the number of nodes 
> discovered using gossip (even if we have not had any interactions with other 
> nodes using gossip at all).
> I have added a 5-second sleep to 
> org.apache.cassandra.gms.Gossiper.GossipTask#run (we also have 1 second of 
> initial delay when we schedule GossipTask)
> {code:java}
>  private class GossipTask implements Runnable
>     {
>         public void run()
>         {
>             try
>             {
>                 //wait on messaging service to start listening
>                 MessagingService.instance().waitUntilListening();
>                 Thread.sleep(5000); // <===============================
>                 taskLock.lock();
> {code}
> and have got the NPE reproduced more frequently. 
> So, it looks like the test may fail if by some reason GossipTask haven't had 
> a chance to run before EndpointState.getApplicationState is invoked as a part 
> of the test logic.
> Note: In 5.1 the test is different and does not have pullSchemaFrom logic at 
> all.
> A conversation about the issue was started in CASSANDRA-19651



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to