Igniters,
During this week I've been working on an improvement that lets to detect
failures at cluster nodes' discovery/communication/network levels as
quick as possible and lets the user to tune such a behavior with a
single configuration parameter.
Sure the failure detection exists for a long time in Ignite and the user
is able to tune it BUT there are around *10* configuration parameters
that have to be setup to achieve a desired result.
When IGNITE-752 is merged to the main development branch all this
behavior will be possible to control with a single parameter -
IgniteConfiguration.failureDetectionThreshold.
By setting the failure detection threshold for a server node it will be
possible to detect failed nodes in a cluster topology during the time
equal to threshold's value and switch to/keep working with only alive
nodes.
By setting the threshold for a client node will let us to connection
failures between the client and its router node (a server node that is a
part of a topology).
In addition, bunch of other improvements and simplifications were done
at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are
aggregated here:
https://issues.apache.org/jira/browse/IGNITE-752
General review is passed. However if anyone wants to review as well or
have any thoughts/suggestions don't hesitate to propose them.
Dmitiry S, I would like to ask you to review documentation changes in
any case before I do a merge.
Regards,
Denis