[ https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215149#comment-16215149 ]
ASF GitHub Bot commented on IGNITE-6071: ---------------------------------------- GitHub user alamar opened a pull request: https://github.com/apache/ignite/pull/2903 IGNITE-6071 White list of exceptions to suppress in createTcpClient. Also add wait in discovery infinite loop to avoid grind You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-6071m7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/2903.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2903 ---- commit f8aa957327312d76f90231b9bfe6d386d1d4ec37 Author: Alexey Kuznetsov <akuznet...@apache.org> Date: 2016-11-11T08:56:42Z Reverted wrong commit. commit c6921a311f437504a45a4667ddde85b14269ba57 Author: Alexey Kuznetsov <akuznet...@apache.org> Date: 2016-11-11T09:01:33Z Fixed classnames.properties generation for ignite-hadoop module. commit d69e26dd8d4fd9383a149c93c251911a8dd67528 Author: Pavel Tupitsyn <ptupit...@apache.org> Date: 2016-11-11T09:22:55Z IGNITE-4126 .NET: Add IgniteConfiguration.SwapSpaceSpi commit a70f0bac3ac2487b8ab58598ad921daa952b485f Author: Andrey V. Mashenkov <andrey.mashen...@gmail.com> Date: 2016-11-11T10:03:40Z IGNITE-4145: Fixes "No query result found for request" exception when running multiple queries concurrently. This closes #1218. commit 8bb8bdda2e846dcc92a2fd449e64d7594b2700ed Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-11T12:01:14Z IGNITE-4053: Moved task error output from console to logger. This closes #1160. commit 7128a395085b60e86436f807b4bdbca83627d41a Author: sboikov <sboi...@gridgain.com> Date: 2016-11-11T12:29:38Z ignite-4154 Optimize amount of data stored in discovery history Discovery history optimizations: - remove discarded message for discovery pending messages - remove duplicated data from TcpDiscoveryNodeAddedMessage.oldNodesDiscoData - do not store unnecessary data in discovery EnsuredMessageHistory - use special property for EnsuredMessageHistory size instead of IGNITE_DISCOVERY_HISTORY_SIZE Affinity history optimizations: - do not store calculated primary/backup maps in history - try save the same assignments instance for caches with similar affinity Exchange messages optimizations: - do not send duplicated partition state maps for caches with similar affinity - use zip compression for data sent in exchange messages commit c3e8a832098887a0fd09b6e8f63d6a8cbaa20eb9 Author: Pavel Tupitsyn <ptupit...@apache.org> Date: 2016-11-11T15:00:36Z .NET: Fix DataStreamerTestTopologyChange tests commit a2a3bedce1a232c0c1db6f5e2b737ab47be250b0 Author: sboikov <sboi...@gridgain.com> Date: 2016-11-14T06:44:48Z Fixed IgniteStartFromStreamConfigurationTest to stop started node. commit 85a4b966fdfb7018d1c91b73df1659082128f786 Author: Pavel Tupitsyn <ptupit...@apache.org> Date: 2016-11-14T10:38:33Z IGNITE-4216 .NET: Fix PlatformAffinityFunction to inject resource into baseFunc commit 6e36a7950db84913ddfd0d98f5a0b50923d2a29c Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-15T09:42:29Z IGNITE-3191: Fields are now sorted for binary objects which don't implement Binarylizable interface. This closes #1197. commit e39888a08da313bec4d30f96488eccb36b4abacc Author: Vasiliy Sisko <vsi...@gridgain.com> Date: 2016-11-17T04:41:05Z IGNITE-4163 Fixed load range queries. commit 3eacc0b59c27be6b4b3aaa09f84b867ba42b449f Author: Alexey Kuznetsov <akuznet...@apache.org> Date: 2016-11-21T10:28:56Z Merged ignite-1.7.3 into ignite-1.7.4. commit 0234f67390c88dceefd6e62de98adb922b4ba9ac Author: Alexey Kuznetsov <akuznet...@apache.org> Date: 2016-11-21T10:40:50Z IGNITE-3443 Implemented metrics for queries monitoring. commit a24a394bb66ba0237a9e9ef940707d422b2980f0 Author: Konstantin Dudkov <kdud...@ya.ru> Date: 2016-11-21T10:53:58Z IGNITE-2523 "single put" NEAR update request commit 88f38ac6305578946f2881b12d2d557bd561f67d Author: Konstantin Dudkov <kdud...@ya.ru> Date: 2016-11-21T12:11:09Z IGNITE-3074 Optimize DHT atomic update future commit 51ca24f2db32dff9c0034603ea3abfe5ef5cd846 Author: Konstantin Dudkov <kdud...@ya.ru> Date: 2016-11-21T13:48:44Z IGNITE-3075 Implement single key-value pair DHT request/response for ATOMIC cache. commit 6e4a279e34584881469a7d841432e6c38db2f06f Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-21T14:15:17Z IGNITE-2355: fix test - clear client connections before and after a test. commit 551f90dbeebcad35a0e3aac07229fb67578f2ab7 Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-21T14:16:49Z Merge remote-tracking branch 'community/ignite-1.7.4' into ignite-1.7.4 commit f2dc1d71705b86428a04a69c4f2d4ee3a82ed1bd Author: sboikov <sboi...@gridgain.com> Date: 2016-11-21T15:12:27Z Merged ignite-1.6.11 into ignite-1.7.4. commit d32fa21b673814b060d2362f06ff44838e9c2cdc Author: sboikov <sboi...@gridgain.com> Date: 2016-11-22T08:33:55Z IGNITE-3075 Fixed condition for 'single' request creation commit d15eba4becf7515b512c1032b193ce75e1589177 Author: Anton Vinogradov <a...@apache.org> Date: 2016-11-22T08:56:20Z IGNITE-4225 DataStreamer can hang on changing topology commit f80bfbd19e7870554bf3abd13bde89b0f39aaee1 Author: Anton Vinogradov <a...@apache.org> Date: 2016-11-22T09:02:57Z IGNITE-3748 Data rebalancing of large cache can hang out. commit bc695f8e3306c6d74d4fe53d9a98adedd43ad8f0 Author: Igor Sapego <isap...@gridgain.com> Date: 2016-11-22T09:05:15Z IGNITE-4227: ODBC: Implemented SQLError. This closes #1237. commit fc9ee6a74fe0bf413ab0643d2776a1a43e6dd5d2 Author: devozerov <voze...@gridgain.com> Date: 2016-11-22T09:05:32Z Merge remote-tracking branch 'upstream/ignite-1.7.4' into ignite-1.7.4 commit 861fab9d0598ca2f06c4a6f293bf2866af31967c Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-22T09:52:03Z IGNITE-4239: add GridInternal annotaion for tasks instead of jobs. This closes #1250. commit ba99df1554fbd1de2b2367b6ce011a024cd199bd Author: tledkov-gridgain <tled...@gridgain.com> Date: 2016-11-22T10:07:20Z IGNITE-4239: test cleanup commit c34d27423a0c45c61341c1fcb3f56727fb91498f Author: Igor Sapego <isap...@gridgain.com> Date: 2016-11-22T11:13:28Z IGNITE-4100: Fix for DEVNOTES paths. commit 9d82f2ca06fa6069c1976cc75814874256b24f8c Author: devozerov <voze...@gridgain.com> Date: 2016-11-22T12:05:29Z IGNITE-4259: Fixed a problem with geospatial indexes and BinaryMarshaller. commit b038730ee56a662f73e02bbec83eb1712180fa82 Author: isapego <igors...@gmail.com> Date: 2016-11-23T09:05:54Z IGNITE-4249: ODBC: Fixed performance issue caused by ineddicient IO handling on CPP side. This closes #1254. commit 7a47a0185d308cd3a58c7bfcb4d1cd548bff5b87 Author: devozerov <voze...@gridgain.com> Date: 2016-11-24T08:14:08Z IGNITE-4270: Allow GridUnsafe.UNALIGNED flag override. ---- > Client may detect necessity for reconnect for too long > ------------------------------------------------------ > > Key: IGNITE-6071 > URL: https://issues.apache.org/jira/browse/IGNITE-6071 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.1 > Reporter: Yakov Zhdanov > Assignee: Ilya Kasnacheev > > There was a GC pause on client that caused servers to drop client due to > inability to establish TCP communication connection. Then it took some time > for client to detect that it has been dropped. During that time client many > times attempted to connect to server which can be seen in the logs. After > client detected its drop and reconnected servers fired node added event and > no log flood can be found any more. > We need to find out why client was reconnecting via communication and did not > detect the drop for such a long time. > I hope this can be reproduced in test: > * start 2 servers > * start client > * suspend all client threads with Thread.suspend() - just filter threads of > current JVM by name and suspend ones belonging to the client. > {noformat} > [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] > Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, > addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, > XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, > lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=true] > [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB] > [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init > [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, > node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, > addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, > XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, > lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=false], evtNode=TcpDiscoveryNode > [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], > sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, > order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init > [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false] > [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=5, minorTopVer=0], evt=NODE_FAILED, > node=dd71479c-41ba-443e-b25c-3803a2a94f4f] > [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52474] > [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52482] > [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52506] > [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52522] > [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52538] > [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52554] > [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52570] > [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52586] > [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52602] > [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52618] > [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52634] > [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52650] > [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:52666] > ... > [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43604] > [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43620] > [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/10.44.3.11:47100, > rmtAddr=/10.44.3.14:43636] > [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], > sockAddrs=[/127.0.0.1:0, > gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6, > intOrder=5, lastExchangeTime=1502270322805, loc=false, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=true] > [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] > Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB] > [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, > node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, > addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, > XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, > lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, > isClient=false], evtNode=TcpDiscoveryNode > [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], > sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, > order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, > ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false] > [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=6, minorTopVer=0], evt=NODE_JOINED, > node=2e80b0f0-21db-451d-a264-34ba16e00ffa] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)