[ 
https://issues.apache.org/jira/browse/IGNITE-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697004#comment-14697004
 ] 

Denis Magda edited comment on IGNITE-1241 at 8/14/15 1:36 PM:
--------------------------------------------------------------

This warning is printed out when a local server node is considered to be 
disconnected from the ring - it neither receives message nor sends to the next 
node.

In the buggy configuration (one server and one client nodes) the server 
periodically received messages from the client node but unable to send 
connection check messages to the next node cause there was no any. This 
affected failure detection timeout implementation logic.

As a fix, the failure should be detected and reported only when there are 
remote server nodes in a topology and a local node seems to be disconnected 
from them. To support this {{TcpDiscovery.hasRemoteServerNodes()}} method was 
implemented and used by failure timeout logic.


was (Author: dmagda):
This warning is printed out when a local server node is considered to be 
disconnected from the ring - it neither receives message nor sends to the next 
node.

In the buggy configuration (one server and one client nodes) the server 
periodically received messages from the client node but unable to send 
connection check messages to the next node cause there was no any. This 
affected failure detection timeout implementation logic.

As a fix, the failure should be detected and reported only when there are 
remote server nodes in a topology and a local node seems to be disconnected 
from them. To support this "TcpDiscovery.hasRemoteServerNodes()" method was 
implemented and used by failure timeout logic.

> Server node prints out failure detection warning if a client node connected
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-1241
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1241
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: ignite-1.4
>            Reporter: Sergey Kozlov
>            Assignee: Denis Magda
>            Priority: Critical
>             Fix For: ignite-1.4
>
>
> 1. Start 1 server node.
> 2. Start 1 client node.
> 3. Server node prints out following message in a few seconds after topology 
> update:
> {noformat}
> [16:57:49,376][INFO][disco-event-worker-#45%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode [id=531641e9-a279-41d4-b2ad-75bc
> ecb1f8b3, addrs=[0:0:0:0:0:0:0:1, 10.0.0.9, 127.0.0.1, 192.168.222.100, 
> 2001:0:5ef5:79fd:34d0:3c29:f5ff:fff6], sockAddrs=[rr/192.168.222.100:0, /0:0:0
> :0:0:0:0:1:0, rr/192.168.222.100:0, /10.0.0.9:0, rr/192.168.222.100:0, 
> /127.0.0.1:0, /192.168.222.100:0, /2001:0:5ef5:79fd:34d0:3c29:f5ff:fff6:0], 
> dis
> cPort=0, order=2, intOrder=2, lastDataReceivedTime=1439387869353, loc=false, 
> ver=1.4.1#20150812-sha1:d5986c26, isClient=true]
> [16:57:49,381][INFO][disco-event-worker-#45%null%][GridDiscoveryManager] 
> Topology snapshot [ver=2, servers=1, clients=1, CPUs=8, heap=2.0GB]
> [16:57:59,362][INFO][tcp-disco-msg-worker-#2%null][TcpDiscoverySpi] Local 
> node seems to be disconnected from topology (failure detection timeout is re
> ached): [failureDetectionTimeout=10000, connCheckFreq=3333]
> [16:57:59,464][INFO][tcp-disco-msg-worker-#2%null][TcpDiscoverySpi] Local 
> node seems to be disconnected from topology (failure detection timeout is re
> ached): [failureDetectionTimeout=10000, connCheckFreq=3333]
> [16:58:01,464][INFO][tcp-disco-msg-worker-#2%null][TcpDiscoverySpi] Local 
> node seems to be disconnected from topology (failure detection timeout is re
> ached): [failureDetectionTimeout=10000, connCheckFreq=3333]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to