[ https://issues.apache.org/jira/browse/IGNITE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640243#comment-14640243 ]
Denis Magda commented on IGNITE-752: ------------------------------------ I'm agree to rename {{failureDetectionThreshold}} to {{failureDetectionTimeout}}, have nothing against this. Also will add 3) to both SPIs javadocs. The same is for 2) - it sounds better and shorter. > Speed up failure detection > -------------------------- > > Key: IGNITE-752 > URL: https://issues.apache.org/jira/browse/IGNITE-752 > Project: Ignite > Issue Type: Bug > Reporter: Yakov Zhdanov > Assignee: Denis Magda > Priority: Blocker > Fix For: sprint-7 > > Attachments: 882.patch, ignite-752.patch > > > I think we can (1) make grid configuration significantly easier and (2) speed > up failure detection. > Here are disco SPI configuration properties which are responsible for failure > detection: > # reconnectCount, > # sockTimeout, > # networkTImeout, > # ackTImeout, > # maxAckTimeout, > # heartbeatFrequency > # maxMissedHearbeats > Same for communication SPI > # reconnectCount, > # maxConnTimeout, > # connTimeout > So, we have 10 or even more properties. > We did it to address half-opened sockets problem (which is pretty common for > cloud environment) and GC pauses which may happen on cluster nodes - we can > increase ack timeouts to prevent them from being kicked off the topology. > By setting value for these props I set timeout for failure detection. Why do > we need such great number of parameters instead of having 1 on > IgniteConfiguration - nodeResponseThreshold (or failureDetectionThreshold - > can anyone propose better name?). > All other parameters will be calculated automatically (I think user can still > set some of them for full control over situation - need to decide if this is > needed.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)