Igniters,

During this week I've been working on an improvement that lets to detect failures at cluster nodes' discovery/communication/network levels as quick as possible and lets the user to tune such a behavior with a single configuration parameter.

Sure the failure detection exists for a long time in Ignite and the user is able to tune it BUT there are around *10* configuration parameters that have to be setup to achieve a desired result.

When IGNITE-752 is merged to the main development branch all this behavior will be possible to control with a single parameter - IgniteConfiguration.failureDetectionThreshold.

By setting the failure detection threshold for a server node it will be possible to detect failed nodes in a cluster topology during the time equal to threshold's value and switch to/keep working with only alive nodes. By setting the threshold for a client node will let us to connection failures between the client and its router node (a server node that is a part of a topology).

In addition, bunch of other improvements and simplifications were done at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are aggregated here:
https://issues.apache.org/jira/browse/IGNITE-752

General review is passed. However if anyone wants to review as well or have any thoughts/suggestions don't hesitate to propose them.

Dmitiry S, I would like to ask you to review documentation changes in any case before I do a merge.


Regards,
Denis

Reply via email to