[ 
https://issues.apache.org/jira/browse/IGNITE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079628#comment-18079628
 ] 

Ignite TC Bot commented on IGNITE-28369:
----------------------------------------

{panel:title=Branch: [pull/13034/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/13034/head] Base: [master] : New Tests 
(3)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Service Grid{color} [[tests 
3|https://ci2.ignite.apache.org/viewLog.html?buildId=9057753]]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceInfoSelfTest.testServiceTopologyEquality - PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceRedeploymentOnNodeLeftTest.testCoordinatorChangeReassignedServiceRedeployment
 - PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceRedeploymentOnNodeLeftTest.testReassignedServiceRedeployment - 
PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=9057800&buildTypeId=IgniteTests24Java8_RunAll]

> Ignite Service may not be redeployed if several nodes leave the cluster
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-28369
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28369
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>              Labels: ise
>             Fix For: 2.19
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We need to fix flaky 
> org.apache.ignite.client.ReliabilityTest#testServiceProxyFailover test. 
> See 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4795807857625973920&tab=testDetails
>  for more details.
> The org.apache.ignite.client.ReliabilityTest#testServiceProxyFailover test 
> can be considered as a reproducer to the mentioned problem.
> To increase test failure rate - place U.sleep(10) in the 
> GridNioServer.AbstractNioClientWorker#bodyInternal worker loop.
> Steps that result is described problem:
> 1. Consider a 3-node cluster with a singleton SERVICE deployed on node 1.
> 2. Node 1 leaves the cluster, triggering a distributed service redeployment 
> process.
> 3. The service is reassigned to node 2.
> 3. While the coordinator waits for all nodes to reply with single messages, 
> node 2 leaves the cluster.
> 4. The coordinator receives the event that node 2 has left the cluster and 
> stops waiting for its single message.
> 5. The coordinator combines the received singleton messages into a full 
> message that contains no information about the SERVICE or its topology. And 
> sends it across the cluster.
> 6. Service topology is set as empty on all cluster nodes.
> 7. A second service redeployment process is triggered by the leaving of node 
> 2. However, at this point, we do not attempt to redeploy the SERVICE because 
> the node 2 is not part of the current service topology. Therefore, nothing 
> happens. And the service becomes unavailable.
> Even if we fix step 7 and the service is eventually redeployed, there is a 
> period of time during which the service topology is unknown. Currently, all 
> calls during this period will result in an error. This is unexpected for a 
> user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to