Stanislav Lukyanov created IGNITE-7753:
------------------------------------------
Summary: Processors are incorrectly initialized if a node joins
during cluster activation
Key: IGNITE-7753
URL: https://issues.apache.org/jira/browse/IGNITE-7753
Project: Ignite
Issue Type: Bug
Affects Versions: 2.3, 2.4, 2.5
Reporter: Stanislav Lukyanov
Assignee: Stanislav Lukyanov
If a node joins during the cluster activation process (while the related
exchange operation is in progress), then some of the GridProcessor instances of
that node will be incorrectly initialized. While GridClusterStateProcessor will
correctly report the active cluster state, other processors that are sensitive
to the cluster state, e.g. GridServiceProcessor, will be not initialized.
A reproducer is below.
=======================
Ignite server =
IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
"server");
CyclicBarrier barrier = new CyclicBarrier(2);
Thread activationThread = new Thread(() -> {
try {
barrier.await();
server.active(true);
}
catch (Exception e) {
e.printStackTrace(); // TODO implement.
}
});
activationThread.start();
barrier.await();
IgnitionEx.setClientMode(true);
Ignite client =
IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
"client");
activationThread.join();
client.services().deployClusterSingleton("myClusterSingleton", new
SimpleMapServiceImpl<>());
=======================
Here a single server node is started, then simultaneously a client node is
being started and the cluster is being activated, then client attempts to
deploy a service. As the result, the thread calling the deploy method hangs
forever with a stack trace like this:
=======================
"main@1" prio=5 tid=0x1 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7505)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceCache(GridServiceProcessor.java:290)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.writeServiceToCache(GridServiceProcessor.java:728)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:634)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:600)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:488)
at
org.apache.ignite.internal.processors.service.GridServiceProcessor.deployClusterSingleton(GridServiceProcessor.java:469)
at
org.apache.ignite.internal.IgniteServicesImpl.deployClusterSingleton(IgniteServicesImpl.java:120)
=======================
The behavior depends on the timings - the client has to join in the middle of
the activation's exchange process. Putting Thread.sleep(4000) into
GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest seems to work on a
development laptop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)