Stanislav Lukyanov created IGNITE-7753:
------------------------------------------

             Summary: Processors are incorrectly initialized if a node joins 
during cluster activation
                 Key: IGNITE-7753
                 URL: https://issues.apache.org/jira/browse/IGNITE-7753
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.3, 2.4, 2.5
            Reporter: Stanislav Lukyanov
            Assignee: Stanislav Lukyanov


If a node joins during the cluster activation process (while the related 
exchange operation is in progress), then some of the GridProcessor instances of 
that node will be incorrectly initialized. While GridClusterStateProcessor will 
correctly report the active cluster state, other processors that are sensitive 
to the cluster state, e.g. GridServiceProcessor, will be not initialized.

A reproducer is below. 
=======================
Ignite server = 
IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
 "server");

        CyclicBarrier barrier = new CyclicBarrier(2);
        Thread activationThread = new Thread(() -> {
            try {
                barrier.await();
                server.active(true);
            }
            catch (Exception e) {
                e.printStackTrace(); // TODO implement.
            }
        });
        activationThread.start();
        barrier.await();

        IgnitionEx.setClientMode(true);
        Ignite client = 
IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
 "client");

        activationThread.join();

        client.services().deployClusterSingleton("myClusterSingleton", new 
SimpleMapServiceImpl<>());
=======================

Here a single server node is started, then simultaneously a client node is 
being started and the cluster is being activated, then client attempts to 
deploy a service. As the result, the thread calling the deploy method hangs 
forever with a stack trace like this:
=======================
"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
          at sun.misc.Unsafe.park(Unsafe.java:-1)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
          at 
org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7505)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceCache(GridServiceProcessor.java:290)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.writeServiceToCache(GridServiceProcessor.java:728)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:634)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:600)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:488)
          at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployClusterSingleton(GridServiceProcessor.java:469)
          at 
org.apache.ignite.internal.IgniteServicesImpl.deployClusterSingleton(IgniteServicesImpl.java:120)
=======================

The behavior depends on the timings - the client has to join in the middle of 
the activation's exchange process. Putting Thread.sleep(4000) into 
GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest seems to work on a 
development laptop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to