Re: Active nodes aliveness WatchDog

Anton Vinogradov Wed, 08 Apr 2020 02:04:40 -0700

It seems you're talking about Failure Detection (Timeouts).
Will it detect node failure on still cluster?


On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> The configuration parameters that I’m aware of are here:
>
>
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html
>
> Other people would be better placed to discuss the internals.
>
> Regards.
> Stephen
>
> > On 8 Apr 2020, at 09:32, Anton Vinogradov <a...@apache.org> wrote:
> >
> > Stephen,
> >
> >> Nodes check on their neighbours and notify the remaining nodes if one
> > disappears.
> > Could you explain how this works in detail?
> > How can I set/change check frequency?
> >
> > On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington <
> > stephen.darling...@gridgain.com> wrote:
> >
> >> This is one of the functions of the DiscoverySPI. Nodes check on their
> >> neighbours and notify the remaining nodes if one disappears. When the
> >> topology changes, it triggers a rebalance, which relocates primary
> >> partitions to live nodes. This is entirely transparent to clients.
> >>
> >> It gets more complex… like there’s the partition loss policy and
> >> rebalancing doesn’t always happen (configurable, persistence, etc)… but
> >> broadly it does as you expect.
> >>
> >> Regards,
> >> Stephen
> >>
> >>> On 8 Apr 2020, at 08:40, Anton Vinogradov <a...@apache.org> wrote:
> >>>
> >>> Igniters,
> >>> Do we have some feature allows to check nodes aliveness on a regular
> >> basis?
> >>>
> >>> Scenario:
> >>> Precondition
> >>> The cluster has no load but some node's JVM crashed.
> >>>
> >>> Expected actual
> >>> The user performs an operation (eg. cache put) related to this node
> (via
> >>> another node) and waits for some timeout to gain it's dead.
> >>> The cluster starts the switch to relocate primary partitions to alive
> >>> nodes.
> >>> Now user able to retry the operation.
> >>>
> >>> Desired
> >>> Some WatchDog checks nodes aliveness on a regular basis.
> >>> Once a failure detected, the cluster starts the switch.
> >>> Later, the user performs an operation on an already fixed cluster and
> >>> waits for nothing.
> >>>
> >>> It would be good news if the "Desired" case is already Actual.
> >>> Can somebody point to the feature that performs this check?
> >>
> >>
> >>
>
>
>

Re: Active nodes aliveness WatchDog

Reply via email to