[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527342#comment-16527342
 ] 

Denis Garus commented on IGNITE-8131:
-------------------------------------

[~sergey-chugunov]
The fix, for this task, contains two parts:

1. testClientReconnectSessionExpire1_1.

When ZK starts a cluster, for some reason, one server may don't start. I think 
it's ZK test framework's bug.
That's the reason why we should try to connect to the next ZK server if the 
previous attempt failed.

2. testClientReconnectSessionExpire1_2.

One of the steps of processing ZK_EVT_NODE_JOIN is an updating the event's data 
on a znode by the ZookeeperDiscoveryImpl#onEventProcessed method.
By this time, it's assumed that client node has joined to topology (field 
ZkRuntimeState#joined is true), but field ZkRuntimeState#evtsData hasn't got 
value yet.
If the connection failure occurs for updating event's data, a reconnect process 
will start. But reconnect process isn't possible because 
ZkRuntimeState#evtsData is null.
If we defer an updating the event's data for client local join, fields 
ZkRuntimeState#joined and ZkRuntimeState#evtsData will have the consistent 
state.

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> ----------------------------------------------------------------------------
>
>                 Key: IGNITE-8131
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8131
>             Project: Ignite
>          Issue Type: Bug
>          Components: zookeeper
>            Reporter: Sergey Chugunov
>            Assignee: Denis Garus
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.7
>
>         Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
>     at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
>     at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
>     at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
>     at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to