[
https://issues.apache.org/jira/browse/IGNITE-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536890#comment-16536890
]
Vitaliy Biryukov commented on IGNITE-8179:
------------------------------------------
Hi, [~sergey-chugunov] please take a look.
This test falls for some reasons:
# Rarely random killed all server nodes, and after tries to start client node.
Now, in this case, а server node will start.
# Sometimes *awaitPartitionMapExchange* throws *IllegalStateException*,
*IgniteClientDisconnectedException* or some exception caused by node stoping
(Nodes forced to kill by communication failure resolver). I've changed it to
*waitForTopology*.
# In case of client reconnects *checkEventsConsistency* throws assertion
error. So I clean events map at each client reconnect.
# In cases where а coordinator is killed by communication failure resolver
sometimes cluster hang on PME. Scenario: PME start -> coordinator left -> some
node send latch countdown ack to future coordinator -> future coordinator
creates *createClientLatch* and immediately complete it, because ack was
received from non-coordinator node.
And there is still one more reason for fails. Rarely long GC pause occurs
during the first connection to Zk cluster and Ignite node fails to start
because of first connection timeout.
This problem can be solved by a significant increase in the session timeout
(calculated as session timeout/Zk servers count *ClientCnxn:381*), but this
will greatly increase test processing time. Or by increasing the size of the
heap.
{noformat}
[20:09:28]W: [org.apache.ignite:ignite-zookeeper] [2018-07-06
17:09:28,624][WARN ][jvm-pause-detector-worker][ZookeeperDiscoverySpiTest9]
Possible too long JVM pause: 2803 milliseconds.
[20:09:28] : [Step 3/4] [2018-07-06 17:09:28,624][INFO
][zk-internal.ZookeeperDiscoverySpiTest9-SendThread(127.0.0.1:39805)][ClientCnxn]
Client session timed out, have not heard from server in 2854ms for sessionid
0x0, closing socket connection and attempting reconnect
[20:09:28]W: [org.apache.ignite:ignite-zookeeper] [2018-07-06
17:09:28,747][WARN
][zk-client-timer-internal.ZookeeperDiscoverySpiTest9][ZookeeperDiscoveryImpl]
Connection to Zookeeper server is lost, local node SEGMENTED.
{noformat}
> ZookeeperDiscoverySpiTest#testCommunicationFailureResolve_KillRandom always
> fails on TC
> ---------------------------------------------------------------------------------------
>
> Key: IGNITE-8179
> URL: https://issues.apache.org/jira/browse/IGNITE-8179
> Project: Ignite
> Issue Type: Bug
> Components: zookeeper
> Reporter: Sergey Chugunov
> Assignee: Vitaliy Biryukov
> Priority: Major
> Labels: MakeTeamcityGreenAgain
>
> Test fails on TC with the following stack trace:
> {noformat}
> class org.apache.ignite.IgniteCheckedException: Failed to start manager:
> GridManagerAdapter [enabled=true,
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
> at
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1698)
> at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1007)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1977)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1720)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1148)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:646)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrids(GridAbstractTest.java:683)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGridsMultiThreaded(GridAbstractTest.java:710)
> at
> org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.startGridsMultiThreaded(GridCommonAbstractTest.java:507)
> at
> org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.startGridsMultiThreaded(GridCommonAbstractTest.java:497)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testCommunicationFailureResolve_KillRandom(ZookeeperDiscoverySpiTest.java:2742)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at junit.framework.TestCase.runTest(TestCase.java:176)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2080)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1995)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
> SPI: ZookeeperDiscoverySpi [zkRootPath=/apacheIgnite,
> zkConnectionString=127.0.0.1:40921,127.0.0.1:35014,127.0.0.1:38754,
> joinTimeout=0, sesTimeout=2000, clientReconnectDisabled=false,
> internalLsnr=null]
> at
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:905)
> at
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1693)
> ... 23 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to
> initialize Zookeeper nodes
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl.initZkNodes(ZookeeperDiscoveryImpl.java:827)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl.startJoin(ZookeeperDiscoveryImpl.java:957)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl.joinTopology(ZookeeperDiscoveryImpl.java:775)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl.startJoinAndWait(ZookeeperDiscoveryImpl.java:693)
> at
> org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.spiStart(ZookeeperDiscoverySpi.java:471)
> at
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> ... 25 more
> Caused by:
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperClientFailedException:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
> = Session expired for /apacheIgnite
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperClient.onZookeeperError(ZookeeperClient.java:758)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperClient.exists(ZookeeperClient.java:276)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl.initZkNodes(ZookeeperDiscoveryImpl.java:789)
> ... 30 more
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /apacheIgnite
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
> at
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperClient.exists(ZookeeperClient.java:273)
> ... 31 more
> {noformat}
> Test passes locally, investigation of failure conditions on TC is needed.
> The issue may be related to the test itself and isn't caused by broken
> functionality.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)