Yakov Zhdanov created IGNITE-6071:
-------------------------------------
Summary: Client may detect necessity for reconnect for too long
Key: IGNITE-6071
URL: https://issues.apache.org/jira/browse/IGNITE-6071
Project: Ignite
Issue Type: Bug
Affects Versions: 2.1
Reporter: Yakov Zhdanov
There was a GC pause on client that caused servers to drop client due to
inability to establish TCP communication connection. Then it took some time for
client to detect that it has been dropped. During that time client many times
attempted to connect to server which can be seen in the logs. After client
detected its drop and reconnected servers fired node added event and no log
flood can be found any more.
We need to find out why client was reconnecting via communication and did not
detect the drop for such a long time.
I hope this can be reproduced in test:
* start 2 servers
* start client
* suspend all client threads with Thread.suspend() - just filter threads of
current JVM by name and suspend ones belonging to the client.
{noformat}
[10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f,
addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0],
discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673, loc=false,
ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager]
Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
[10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12,
node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3,
lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
isClient=false], evtNode=TcpDiscoveryNode
[id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true,
ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false]
[10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=5, minorTopVer=0], evt=NODE_FAILED,
node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
[10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52474]
[10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52482]
[10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52506]
[10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52522]
[10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52538]
[10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52554]
[10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52570]
[10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52586]
[10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52602]
[10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52618]
[10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52634]
[10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52650]
[10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:52666]
...
[10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:43604]
[10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:43620]
[10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
rmtAddr=/10.44.3.14:43636]
[10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added
new node to topology: TcpDiscoveryNode
[id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0],
discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false,
ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager]
Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
[10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10,
node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3,
lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
isClient=false], evtNode=TcpDiscoveryNode
[id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true,
ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false]
[10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=6, minorTopVer=0], evt=NODE_JOINED,
node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)