[ 
https://issues.apache.org/jira/browse/IGNITE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369189#comment-16369189
 ] 

Stanislav Lukyanov edited comment on IGNITE-7753 at 2/19/18 2:44 PM:
---------------------------------------------------------------------

The bug is caused by a coding error in the 
GridClusterStateProcessor.onStateFinishMessage. Future that holds the 
activation result is always finished with false (joinFut.onDone(false)). The 
patch below fixes the problem:

--- 
modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
     (revision 1a6e54489d58ceb50521523c00383b13d6e3bd8b)
+++ 
modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
     (date 1519047408803)
@@ -389,7 +389,7 @@
             TransitionOnJoinWaitFuture joinFut = this.joinFut;
 
             if (joinFut != null)
-                joinFut.onDone(false);
+                joinFut.onDone(msg.clusterActive());
 
             GridFutureAdapter<Void> transitionFut = 
transitionFuts.remove(state.transitionRequestId());
 



was (Author: slukyanov):
The bug is caused by a coding error in the 
GridClusterStateProcessor.onStateFinishMessage. Future that holds the 
activation result is always finished with false (joinFut.onDone(false)). The 
patch below fixes the problem:
{{
--- 
modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
     (revision 1a6e54489d58ceb50521523c00383b13d6e3bd8b)
+++ 
modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
     (date 1519047408803)
@@ -389,7 +389,7 @@
             TransitionOnJoinWaitFuture joinFut = this.joinFut;
 
             if (joinFut != null)
-                joinFut.onDone(false);
+                joinFut.onDone(msg.clusterActive());
 
             GridFutureAdapter<Void> transitionFut = 
transitionFuts.remove(state.transitionRequestId());
 
}}

> Processors are incorrectly initialized if a node joins during cluster 
> activation
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-7753
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7753
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3, 2.4, 2.5
>            Reporter: Stanislav Lukyanov
>            Assignee: Stanislav Lukyanov
>            Priority: Major
>
> If a node joins during the cluster activation process (while the related 
> exchange operation is in progress), then some of the GridProcessor instances 
> of that node will be incorrectly initialized. While GridClusterStateProcessor 
> will correctly report the active cluster state, other processors that are 
> sensitive to the cluster state, e.g. GridServiceProcessor, will be not 
> initialized.
> A reproducer is below. 
> =======================
> Ignite server = 
> IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
>  "server");
>         CyclicBarrier barrier = new CyclicBarrier(2);
>         Thread activationThread = new Thread(() -> {
>             try {
>                 barrier.await();
>                 server.active(true);
>             }
>             catch (Exception e) {
>                 e.printStackTrace(); // TODO implement.
>             }
>         });
>         activationThread.start();
>         barrier.await();
>         IgnitionEx.setClientMode(true);
>         Ignite client = 
> IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
>  "client");
>         activationThread.join();
>         client.services().deployClusterSingleton("myClusterSingleton", new 
> SimpleMapServiceImpl<>());
> =======================
> Here a single server node is started, then simultaneously a client node is 
> being started and the cluster is being activated, then client attempts to 
> deploy a service. As the result, the thread calling the deploy method hangs 
> forever with a stack trace like this:
> =======================
> "main@1" prio=5 tid=0x1 nid=NA waiting
>   java.lang.Thread.State: WAITING
>         at sun.misc.Unsafe.park(Unsafe.java:-1)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7505)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceCache(GridServiceProcessor.java:290)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.writeServiceToCache(GridServiceProcessor.java:728)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:634)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:600)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:488)
>         at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.deployClusterSingleton(GridServiceProcessor.java:469)
>         at 
> org.apache.ignite.internal.IgniteServicesImpl.deployClusterSingleton(IgniteServicesImpl.java:120)
> =======================
> The behavior depends on the timings - the client has to join in the middle of 
> the activation's exchange process. Putting Thread.sleep(4000) into 
> GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest seems to work on 
> a development laptop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to