[
https://issues.apache.org/jira/browse/IGNITE-13012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118756#comment-17118756
]
Ignite TC Bot commented on IGNITE-13012:
----------------------------------------
{panel:title=Branch: [pull/7835/head] Base: [master] : No blockers
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All*
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5342449&buildTypeId=IgniteTests24Java8_RunAll]
> Make node connection checking rely on the configuration. Simplify node ping
> routine.
> ------------------------------------------------------------------------------------
>
> Key: IGNITE-13012
> URL: https://issues.apache.org/jira/browse/IGNITE-13012
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Node-to-next-node connection checking has several drawbacks which go
> together. We should fix the following :
> 1) Make connection check interval half of actual failure detection timeout.
> Current value is a constant:
> {code:java}static int ServerImpls.CON_CHECK_INTERVAL = 500{code}
> 2) Make connection check interval rely on common time of any last sent
> message. Current ping is bound to own time:
> {code:java}ServerImpl. RingMessageWorker.lastTimeConnCheckMsgSent{code}
> This is weird because any discovery message check connection. And
> TpcDiscoveryConnectionCheckMessage is just an addition when message queue is
> empty for a long time.
> 3) Remove additional, randomly appearing and quickened connection checking.
> Once we do #1, this will become even more useless.
> Despite TCP discovery has a period of connection checking (see #1), it may
> send ping before this period exhausts. This premature node ping relies on the
> time of any sent or even received message. Imagine: if node 2 receives no
> message from node 1 within some time, it decides to do extra ping node 3 not
> waiting for regular ping. Such behavior makes confusion and gives no
> benefits.
> See {code:java}ServerImpl.RingMessageWorker.failureThresholdReached{code}
> 4) Do not worry user with “Node disconnected” when everything is OK. Once we
> do #1, this will become even more useless. Fixing #3 also fixes this issue.
> If #3 happens, node writes in the log on INFO: “Local node seems to be
> disconnected from topology …” whereas it is not actually disconnected at all.
> User can see this unexpected and worrying message if he typed
> IgniteConfiguration.failureDetectionTimeout < 500ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)