[ 
https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689619#comment-16689619
 ] 

ASF GitHub Bot commented on IGNITE-4003:
----------------------------------------

GitHub user ilantukh opened a pull request:

    https://github.com/apache/ignite/pull/5417

    IGNITE-4003

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-4003-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/5417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5417
    
----
commit 7d3aae218f69ef3959026463e97b4f6f2a836cf1
Author: Ilya Lantukh <ilantukh@...>
Date:   2018-11-16T15:39:56Z

    IGNITE-4003 : WIP.

commit c9667fbaf47f37cdffa48bb1c78eafb3550e4c55
Author: Ilya Lantukh <ilantukh@...>
Date:   2018-11-16T16:17:43Z

    IGNITE-4003 : Fixed recovery descriptor initialization.

----


> Slow or faulty client can stall the whole cluster.
> --------------------------------------------------
>
>                 Key: IGNITE-4003
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4003
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, general
>    Affects Versions: 1.7
>            Reporter: Vladimir Ozerov
>            Assignee: Ilya Lantukh
>            Priority: Critical
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. 
> Client will join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying 
> to establish outgoing connection to the client, but fail as Docker subnet is 
> not visible from the outside. It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may 
> have left the grid or TCP connection cannot be established due to firewall 
> issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true], topic=T4 [topic=TOPIC_CACHE, 
> id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, 
> id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage 
> [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, 
> data=null, futId=null], policy=2]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_51]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_51]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message 
> to remote node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
>  [ignite-core-1.5.23.jar:1.5.23]
>       ... 32 common frames omitted
> Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to 
> node (is node still alive?). Make sure that each GridComputeTask and 
> GridCacheTransaction has a timeout set in order to prevent parties from 
> waiting forever in case of network issues 
> [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[/172.17.0.6:47100, 
> /127.0.0.1:47100]]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       ... 12 common frames omitted
>       Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
> to address: /172.17.0.6:47100
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>               ... 35 common frames omitted
>       Caused by: java.net.SocketTimeoutException: null
>               at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
>               ... 35 common frames omitted
>       Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
> to address: /127.0.0.1:47100
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>               ... 35 common frames omitted
>       Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is 
> not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
>               ... 35 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to