[akka-user] Re: akka-cluster - generic failure detector configuration(S)

'Francesco laTorre' via Akka User List Mon, 02 Jan 2017 14:01:47 -0800

Hi there,

Any clue anyone on this ?
Would be great if we could some help to get this aspect clarified.


Cheers,
Francesco

On 29 December 2016 at 17:12, Francesco laTorre <
[email protected]> wrote:

> Hi hAkkers,
>
> From the generic configuration :
> http://doc.akka.io/docs/akka/current/general/configuration.html
>
> I don't really get the differences between the
> akka.cluster.failure-detector and akka.remote.*-failure-detector :
>
> *akka* {
>
>   [...]
>
>   *cluster* {
>
>
>     # Settings for the Phi accrual failure detector (
> http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
>     # [Hayashibara et al]) used by the cluster subsystem to detect
> unreachable
>     # members.
>     # The default PhiAccrualFailureDetector will trigger if there are no
> heartbeats within
>     # the duration heartbeat-interval + acceptable-heartbeat-pause +
> threshold_adjustment,
>     # i.e. around 5.5 seconds with default settings.
>     *failure-detector* {
>
>       # FQCN of the failure detector implementation.
>       # It must implement akka.remote.FailureDetector and have
>       # a public constructor with a com.typesafe.config.Config and
>       # akka.actor.EventStream parameter.
>       *implementation-class = "akka.remote.PhiAccrualFailureDetector"*
>
>       # How often keep-alive heartbeat messages should be sent to each
> connection.
>       heartbeat-interval = 1 s
>
>       # Defines the failure detector threshold.
>       # A low threshold is prone to generate many wrong suspicions but
> ensures
>       # a quick detection in the event of a real crash. Conversely, a high
>       # threshold generates fewer mistakes but needs more time to detect
>       # actual crashes.
>       threshold = 8.0
>
>       # Number of the samples of inter-heartbeat arrival times to
> adaptively
>       # calculate the failure timeout for connections.
>       max-sample-size = 1000
>
>       # Minimum standard deviation to use for the normal distribution in
>       # AccrualFailureDetector. Too low standard deviation might result in
>       # too much sensitivity for sudden, but normal, deviations in
> heartbeat
>       # inter arrival times.
>       min-std-deviation = 100 ms
>
>       # Number of potentially lost/delayed heartbeats that will be
>       # accepted before considering it to be an anomaly.
>       # This margin is important to be able to survive sudden, occasional,
>       # pauses in heartbeat arrivals, due to for example garbage collect or
>       # network drop.
>       acceptable-heartbeat-pause = 3 s
>
>       # Number of member nodes that each member will send heartbeat
> messages to,
>       # i.e. each node will be monitored by this number of other nodes.
>       monitored-by-nr-of-members = 5
>
>       # After the heartbeat request has been sent the first failure
> detection
>       # will start after this period, even though no heartbeat message has
>       # been received.
>       expected-response-after = 1 s
>
>     }
>
>   [...]
>
> }
>
> and
>
> *akka* {
>
>   [...]
>
>   *remote* {
>
>     ### Settings shared by classic remoting and Artery (the new
> implementation of remoting)
>
>     # If set to a nonempty string remoting will use the given dispatcher
> for
>     # its internal actors otherwise the default dispatcher is used. Please
> note
>     # that since remoting can load arbitrary 3rd party drivers (see
>     # "enabled-transport" and "adapters" entries) it is not guaranteed that
>     # every module will respect this setting.
>     use-dispatcher = "akka.remote.default-remote-dispatcher"
>
>     # Settings for the failure detector to monitor connections.
>     # For TCP it is not important to have fast failure detection, since
>     # most connection failures are captured by TCP itself.
>     # The default DeadlineFailureDetector will trigger if there are no
> heartbeats within
>     # the duration heartbeat-interval + acceptable-heartbeat-pause, i.e.
> 20 seconds
>     # with the default settings.
>     *transport-failure-detector* {
>
>       # FQCN of the failure detector implementation.
>       # It must implement akka.remote.FailureDetector and have
>       # a public constructor with a com.typesafe.config.Config and
>       # akka.actor.EventStream parameter.
>       *implementation-class = "akka.remote.DeadlineFailureDetector"*
>
>       # How often keep-alive heartbeat messages should be sent to each
> connection.
>       heartbeat-interval = 4 s
>
>       # Number of potentially lost/delayed heartbeats that will be
>       # accepted before considering it to be an anomaly.
>       # A margin to the `heartbeat-interval` is important to be able to
> survive sudden,
>       # occasional, pauses in heartbeat arrivals, due to for example
> garbage collect or
>       # network drop.
>       acceptable-heartbeat-pause = 16 s
>     }
>
>     # Settings for the Phi accrual failure detector (
> http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
>     # [Hayashibara et al]) used for remote death watch.
>     # The default PhiAccrualFailureDetector will trigger if there are no
> heartbeats within
>     # the duration heartbeat-interval + acceptable-heartbeat-pause +
> threshold_adjustment,
>     # i.e. around 12.5 seconds with default settings.
>     *watch-failure-detector* {
>
>       # FQCN of the failure detector implementation.
>       # It must implement akka.remote.FailureDetector and have
>       # a public constructor with a com.typesafe.config.Config and
>       # akka.actor.EventStream parameter.
>       *implementation-class = "akka.remote.PhiAccrualFailureDetector"*
>
>       # How often keep-alive heartbeat messages should be sent to each
> connection.
>       heartbeat-interval = 1 s
>
>       # Defines the failure detector threshold.
>       # A low threshold is prone to generate many wrong suspicions but
> ensures
>       # a quick detection in the event of a real crash. Conversely, a high
>       # threshold generates fewer mistakes but needs more time to detect
>       # actual crashes.
>       threshold = 10.0
>
>       # Number of the samples of inter-heartbeat arrival times to
> adaptively
>       # calculate the failure timeout for connections.
>       max-sample-size = 200
>
>       # Minimum standard deviation to use for the normal distribution in
>       # AccrualFailureDetector. Too low standard deviation might result in
>       # too much sensitivity for sudden, but normal, deviations in
> heartbeat
>       # inter arrival times.
>       min-std-deviation = 100 ms
>
>       # Number of potentially lost/delayed heartbeats that will be
>       # accepted before considering it to be an anomaly.
>       # This margin is important to be able to survive sudden, occasional,
>       # pauses in heartbeat arrivals, due to for example garbage collect or
>       # network drop.
>       acceptable-heartbeat-pause = 10 s
>
>
>       # How often to check for nodes marked as unreachable by the failure
>       # detector
>       unreachable-nodes-reaper-interval = 1s
>
>       # After the heartbeat request has been sent the first failure
> detection
>       # will start after this period, even though no heartbeat mesage has
>       # been received.
>       expected-response-after = 1 s
>
>     }
>
>     [...]
> }
>
> So there are all based on the heartbeats and triggers when values jump
> above thresholds.
> Akka Cluster is built on top of Akka Remote, but the configuration
> generates some ambiguities :
>
>    - default settings for akka.cluster.failure-detector will trigger
>    *PhiAccrualFailureDetector* if there are no heartbeats within *5.5s*
>    - default settings for akka.remote.watch-failure-detector will trigger
>    *PhiAccrualFailureDetector* if there are no heartbeats within* 12.5s*
>
> moreover
>
>    - akka.cluster.failure-detector is used by the cluster subsystem to
>    detect unreachable members.
>    - akka.remote.watch-failure-detector is used for remote death watch.
>
>
> *Q1* : when using akka cluster, if a node goes down( Ctrl+Z, GC, netork
> failure etc), which PhiAccrualFailureDetector is trigger and when ?
>
> *Q2* : I've enabled logs ad debug level but cannot see any of these
> mentioned, the only one I can see is
>
> 16:47:22.978 [activity-feeds-akka.actor.default-dispatcher-16] INFO
>  a.r.transport.ProtocolStateActor - No response from remote. Transport
> failure detector triggered. (internal state was Open)
> 16:47:23.109 [activity-feeds-akka.actor.default-dispatcher-6] INFO
>  a.r.transport.ProtocolStateActor - No response from remote. Transport
> failure detector triggered. (internal state was Open)
>
> which seems to be akka.remote.transport-failure-detector.
>
> Can anyone please help me tuning the configuration correctly ?
>
> Cheers,
> Francesco
>
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: akka-cluster - generic failure detector configuration(S)

Reply via email to