[
https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150343#comment-16150343
]
ASF GitHub Bot commented on IGNITE-6071:
----------------------------------------
GitHub user alamar opened a pull request:
https://github.com/apache/ignite/pull/2575
IGNITE-6071 Finite number of attempts in reserveClient()
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gridgain/apache-ignite ignite-6071
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/ignite/pull/2575.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2575
----
commit 1e8e082ced9fc99b20867d05ede08e881a0c503b
Author: Ilya Kasnacheev <[email protected]>
Date: 2017-09-01T10:24:52Z
IGNITE-6071 Finite number of attempts in reserveClient()
----
> Client may detect necessity for reconnect for too long
> ------------------------------------------------------
>
> Key: IGNITE-6071
> URL: https://issues.apache.org/jira/browse/IGNITE-6071
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.1
> Reporter: Yakov Zhdanov
> Assignee: Ilya Kasnacheev
>
> There was a GC pause on client that caused servers to drop client due to
> inability to establish TCP communication connection. Then it took some time
> for client to detect that it has been dropped. During that time client many
> times attempted to connect to server which can be seen in the logs. After
> client detected its drop and reconnected servers fired node added event and
> no log flood can be found any more.
> We need to find out why client was reconnecting via communication and did not
> detect the drop for such a long time.
> I hope this can be reproduced in test:
> * start 2 servers
> * start client
> * suspend all client threads with Thread.suspend() - just filter threads of
> current JVM by name and suspend ones belonging to the client.
> {noformat}
> [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f,
> addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0,
> XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2,
> lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0,
> isClient=true]
> [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager]
> Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
> [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init
> [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12,
> node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
> addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
> XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3,
> lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
> isClient=false], evtNode=TcpDiscoveryNode
> [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1],
> sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
> order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true,
> ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
> [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init
> [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false]
> [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=5, minorTopVer=0], evt=NODE_FAILED,
> node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
> [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52474]
> [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52482]
> [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52506]
> [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52522]
> [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52538]
> [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52554]
> [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52570]
> [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52586]
> [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52602]
> [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52618]
> [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52634]
> [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52650]
> [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:52666]
> ...
> [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:43604]
> [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:43620]
> [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/10.44.3.11:47100,
> rmtAddr=/10.44.3.14:43636]
> [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager]
> Added new node to topology: TcpDiscoveryNode
> [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1],
> sockAddrs=[/127.0.0.1:0,
> gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6,
> intOrder=5, lastExchangeTime=1502270322805, loc=false,
> ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
> [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager]
> Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
> [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10,
> node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
> addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
> XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3,
> lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
> isClient=false], evtNode=TcpDiscoveryNode
> [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1],
> sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
> order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true,
> ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
> [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false]
> [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], evt=NODE_JOINED,
> node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)