[ https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215152#comment-16215152 ]
ASF GitHub Bot commented on IGNITE-6071: ---------------------------------------- GitHub user alamar opened a pull request: https://github.com/apache/ignite/pull/2906 IGNITE-6071 White list of exceptions to suppress in createTcpClient. Also add wait in discovery infinite loop to avoid grind You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-6071m1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/2906.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2906 ---- commit db64729fc9ebb0217f06b0cf9d5e53ab8d657510 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-22T08:29:32Z ignite-6124 Fixed NPE in GridDhtPartitionsExchangeFuture.topologyVersion after future cleanup. (cherry picked from commit 2c9057a) commit 5b7724714264c14cc10f4b25abc9234387224e4b Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-22T08:50:35Z Fixed javadoc format. commit 785a85eb0155444b3eef48cf373a990dc4c8c6dd Author: sboikov <sboi...@gridgain.com> Date: 2017-08-22T09:24:03Z ignite-5872 GridDhtPartitionsSingleMessage.partitionUpdateCounters should not return null. commit 6b506e774c59b64fc6254ea151699c852620a408 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-22T09:24:21Z Merge remote-tracking branch 'community/ignite-2.1.4' into ignite-2.1.4 commit 160d9b7c707efc359b4014aa1a481dc0fbbf596f Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-22T11:10:10Z Fixed flaky test. commit 9ed4b72044ba1b2c105761b431625736166af7e7 Author: Alexey Goncharuk <alexey.goncha...@gmail.com> Date: 2017-08-01T09:25:25Z master - Fixed visor compilation after merge commit 16b819a6131c95a30d8dfaefbac6f6593826258b Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-22T13:40:02Z Increased test timeout. commit 9ab49d4a743c42b4a2f645b8af7611922629c9a3 Author: oleg-ostanin <oosta...@gridgain.com> Date: 2017-08-22T13:39:31Z IGNITE-6155 added new jvm flag for printing gc date stamps (cherry picked from commit 03211d2) commit e780c6b98b5a09cff44ec5d2fa1fd30275ffc35f Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-22T14:10:07Z Fixed test to work with new update counter maps. commit db6add1d2ee17381b810cff3ff978eef4cef51b0 Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-22T14:20:07Z Removed explicit fail(). commit bc1cc99eab9641753925ef2552ba29831640e9e1 Author: Dmitriy Govorukhin <dmitriy.govoruk...@gmail.com> Date: 2017-08-22T13:34:31Z IGNITE-6154 fix incorrect check checkpoint pages commit 8dbdd03143362bb39d96242b23975efb22412709 Author: Ivan Rakov <ivan.glu...@gmail.com> Date: 2017-08-22T14:03:42Z IGNITE-6154 also fixed check for WAL record commit afad8e0fc58160f7876925dc6c3051be7a168155 Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-23T09:18:44Z Muted hanging test. commit ad38f7b4b5e6845b2ccccd7eb888f805484504f5 Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-23T11:12:42Z gg-12662 : Fixed JDBC backward compatibility. commit 28c906e3e0c51e6f1a4a95b2027d248f9b5035c2 Author: Sergey Chugunov <sergey.chugu...@gmail.com> Date: 2017-08-02T15:14:46Z IGNITE-5542 CacheGroup configuration from cluster is merged with local settings (cherry picked from commit 88818ec) commit caeb11936fa3534b9468d443c11744362044cae5 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-23T12:19:52Z ignite-6124 Guard logging with isInfoEnabled (cherry picked from commit bebe4d8) commit 6f407ebabb9dc27459fdbee6423640132b995b1d Author: tledkov-gridgain <tled...@gridgain.com> Date: 2017-08-23T12:46:23Z IGNITE-6169: Fixed thin JDBC driver compatibility problem. commit 9dac636c4eef494fe612389c19218eec92057fc0 Author: Ilya Kasnacheev <ilya.kasnach...@gmail.com> Date: 2017-08-23T13:26:58Z IGNITE-4643: Fixed NPE in JdbcDatabaseMetadata.getIndexInfo(). This closes #2481. commit 77241cdc45c90ee9bab4a7a0f3d5a1a7664e3426 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-23T13:45:34Z Merge remote-tracking branch 'community/ignite-2.1.4' into ignite-2.1.4 commit a5e376f63886696331e5be0c457dc0624c49e3d4 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-23T13:44:04Z ignite-6124 Added missed initialization of merged join exchanges in GridDhtPartitionsExchangeFuture.onBecomeCoordinator (cherry picked from commit 0c5dca9) commit be5589db9e0600b295b745ddab5e7aae390ac7ae Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-23T14:25:33Z ignite-5986 : Fixed failing .NET test. commit 43e4ff2c0ecd1ef30d18cf1fbc9052f5ba703d05 Author: sboikov <sboi...@gridgain.com> Date: 2017-07-18T14:52:51Z Fixed test IgniteClusterActivateDeactivateTestWithPersistence. (cherry picked from commit 54585ab) commit d596b7806db3f002f83da5a02bc882d03dae3dfd Author: Ilya Lantukh <ilant...@gridgain.com> Date: 2017-08-23T15:23:06Z Updated classnames.properties. commit 3e08cd401d598a34832e72afc5e6c94a3a9ab081 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-23T15:29:52Z ignite-6174 Temporary changed test until issue not fixed (cherry picked from commit 4fe8f76) commit 44e0b4cd62142dce8cf39f826449b9a04e22e1cf Author: Alexey Kuznetsov <akuznet...@apache.org> Date: 2017-08-24T07:57:36Z IGNITE-6136 Fixed version for demo. (cherry picked from commit e1bf8d7) commit 8d1838b03d6c1e5f86dfbb7f41c59895775e20c1 Author: Dmitry Pavlov <dpavlov....@gmail.com> Date: 2017-07-27T11:51:25Z Adjusted memory policy to prevent OOM. commit a3ec54b16bce1a569fbefba17188ccb4702b82a4 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-24T11:09:12Z ignite-6124 DataStreamerImpl: do not wait for exchange future inside cache gateway. (cherry picked from commit 3ab523c) commit 30e6d019a21f4a045a50d7d95a04507e3b646e69 Author: sboikov <sboi...@gridgain.com> Date: 2017-08-24T11:10:34Z Merge remote-tracking branch 'community/ignite-2.1.4' into ignite-2.1.4 commit 41f574a7372ffc04b69809298798f24fb34c161f Author: Dmitriy Govorukhin <dgovoruk...@gridgain.com> Date: 2017-08-24T12:58:27Z Fixed test. commit 943736b36d67381157fc2807cd7af4b03d44fef3 Author: nikolay_tikhonov <ntikho...@gridgain.com> Date: 2017-08-24T15:58:16Z Revert "IGNITE-5947 Fixed "ClassCastException when two-dimensional array is fetched from cache". * Due to this changes break compatibility with .NET; * This fix doesn't cover all cases. Signed-off-by: nikolay_tikhonov <ntikho...@gridgain.com> ---- > Client may detect necessity for reconnect for too long > ------------------------------------------------------ > > Key: IGNITE-6071 > URL: https://issues.apache.org/jira/browse/IGNITE-6071 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.1 > Reporter: Yakov Zhdanov > Assignee: Ilya Kasnacheev > > There was a GC pause on client that caused servers to drop client due to > inability to establish TCP communication connection. Then it took some time > for client to detect that it has been dropped. During that time client many > times attempted to connect to server which can be seen in the logs. After > client detected its drop and reconnected servers fired node added event and > no log flood can be found any more. > We need to find out why client was reconnecting via communication and did not > detect the drop for such a long time. > I hope this can be reproduced in test: > * start 2 servers > * start client > * suspend all client threads with Thread.suspend() - just filter threads of > current JVM by name and suspend ones belonging to the client. > {noformat} > [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] > Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, > addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, > XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, > lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=true] > [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB] > [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init > [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, > node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, > addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, > XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, > lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=false], evtNode=TcpDiscoveryNode > [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], > sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, > order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init > [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false] > [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=5, minorTopVer=0], evt=NODE_FAILED, > node=dd71479c-41ba-443e-b25c-3803a2a94f4f] > [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52474] > [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52482] > [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52506] > [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52522] > [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52538] > [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52554] > [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52570] > [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52586] > [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52602] > [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52618] > [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52634] > [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52650] > [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52666] > ... > [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43604] > [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43620] > [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43636] > [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], > sockAddrs=[/127.0.0.1:0, > gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6, > intOrder=5, lastExchangeTime=1502270322805, loc=false, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=true] > [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB] > [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, > node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, > addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, > XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, > lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=false], evtNode=TcpDiscoveryNode > [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], > sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, > order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false] > [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=6, minorTopVer=0], evt=NODE_JOINED, > node=2e80b0f0-21db-451d-a264-34ba16e00ffa] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)