[akka-user] akka-cluster - generic failure detector configuration(S)

'Francesco laTorre' via Akka User List Thu, 29 Dec 2016 09:13:08 -0800

Hi hAkkers,

>From the generic configuration :
http://doc.akka.io/docs/akka/current/general/configuration.html


I don't really get the differences between the akka.cluster.failure-detector
and akka.remote.*-failure-detector :

*akka* {

  [...]

  *cluster* {


    # Settings for the Phi accrual failure detector (
http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
    # [Hayashibara et al]) used by the cluster subsystem to detect
unreachable
    # members.
    # The default PhiAccrualFailureDetector will trigger if there are no
heartbeats within
    # the duration heartbeat-interval + acceptable-heartbeat-pause +
threshold_adjustment,
    # i.e. around 5.5 seconds with default settings.
    *failure-detector* {

      # FQCN of the failure detector implementation.
      # It must implement akka.remote.FailureDetector and have
      # a public constructor with a com.typesafe.config.Config and
      # akka.actor.EventStream parameter.
      *implementation-class = "akka.remote.PhiAccrualFailureDetector"*

      # How often keep-alive heartbeat messages should be sent to each
connection.
      heartbeat-interval = 1 s

      # Defines the failure detector threshold.
      # A low threshold is prone to generate many wrong suspicions but
ensures
      # a quick detection in the event of a real crash. Conversely, a high
      # threshold generates fewer mistakes but needs more time to detect
      # actual crashes.
      threshold = 8.0

      # Number of the samples of inter-heartbeat arrival times to adaptively
      # calculate the failure timeout for connections.
      max-sample-size = 1000

      # Minimum standard deviation to use for the normal distribution in
      # AccrualFailureDetector. Too low standard deviation might result in
      # too much sensitivity for sudden, but normal, deviations in heartbeat
      # inter arrival times.
      min-std-deviation = 100 ms

      # Number of potentially lost/delayed heartbeats that will be
      # accepted before considering it to be an anomaly.
      # This margin is important to be able to survive sudden, occasional,
      # pauses in heartbeat arrivals, due to for example garbage collect or
      # network drop.
      acceptable-heartbeat-pause = 3 s

      # Number of member nodes that each member will send heartbeat
messages to,
      # i.e. each node will be monitored by this number of other nodes.
      monitored-by-nr-of-members = 5

      # After the heartbeat request has been sent the first failure
detection
      # will start after this period, even though no heartbeat message has
      # been received.
      expected-response-after = 1 s

    }

  [...]

}

and

*akka* {

  [...]

  *remote* {

    ### Settings shared by classic remoting and Artery (the new
implementation of remoting)

    # If set to a nonempty string remoting will use the given dispatcher for
    # its internal actors otherwise the default dispatcher is used. Please
note
    # that since remoting can load arbitrary 3rd party drivers (see
    # "enabled-transport" and "adapters" entries) it is not guaranteed that
    # every module will respect this setting.
    use-dispatcher = "akka.remote.default-remote-dispatcher"

    # Settings for the failure detector to monitor connections.
    # For TCP it is not important to have fast failure detection, since
    # most connection failures are captured by TCP itself.
    # The default DeadlineFailureDetector will trigger if there are no
heartbeats within
    # the duration heartbeat-interval + acceptable-heartbeat-pause, i.e. 20
seconds
    # with the default settings.
    *transport-failure-detector* {

      # FQCN of the failure detector implementation.
      # It must implement akka.remote.FailureDetector and have
      # a public constructor with a com.typesafe.config.Config and
      # akka.actor.EventStream parameter.
      *implementation-class = "akka.remote.DeadlineFailureDetector"*

      # How often keep-alive heartbeat messages should be sent to each
connection.
      heartbeat-interval = 4 s

      # Number of potentially lost/delayed heartbeats that will be
      # accepted before considering it to be an anomaly.
      # A margin to the `heartbeat-interval` is important to be able to
survive sudden,
      # occasional, pauses in heartbeat arrivals, due to for example
garbage collect or
      # network drop.
      acceptable-heartbeat-pause = 16 s
    }

    # Settings for the Phi accrual failure detector (
http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
    # [Hayashibara et al]) used for remote death watch.
    # The default PhiAccrualFailureDetector will trigger if there are no
heartbeats within
    # the duration heartbeat-interval + acceptable-heartbeat-pause +
threshold_adjustment,
    # i.e. around 12.5 seconds with default settings.
    *watch-failure-detector* {

      # FQCN of the failure detector implementation.
      # It must implement akka.remote.FailureDetector and have
      # a public constructor with a com.typesafe.config.Config and
      # akka.actor.EventStream parameter.
      *implementation-class = "akka.remote.PhiAccrualFailureDetector"*

      # How often keep-alive heartbeat messages should be sent to each
connection.
      heartbeat-interval = 1 s

      # Defines the failure detector threshold.
      # A low threshold is prone to generate many wrong suspicions but
ensures
      # a quick detection in the event of a real crash. Conversely, a high
      # threshold generates fewer mistakes but needs more time to detect
      # actual crashes.
      threshold = 10.0

      # Number of the samples of inter-heartbeat arrival times to adaptively
      # calculate the failure timeout for connections.
      max-sample-size = 200

      # Minimum standard deviation to use for the normal distribution in
      # AccrualFailureDetector. Too low standard deviation might result in
      # too much sensitivity for sudden, but normal, deviations in heartbeat
      # inter arrival times.
      min-std-deviation = 100 ms

      # Number of potentially lost/delayed heartbeats that will be
      # accepted before considering it to be an anomaly.
      # This margin is important to be able to survive sudden, occasional,
      # pauses in heartbeat arrivals, due to for example garbage collect or
      # network drop.
      acceptable-heartbeat-pause = 10 s


      # How often to check for nodes marked as unreachable by the failure
      # detector
      unreachable-nodes-reaper-interval = 1s

      # After the heartbeat request has been sent the first failure
detection
      # will start after this period, even though no heartbeat mesage has
      # been received.
      expected-response-after = 1 s

    }

    [...]
}

So there are all based on the heartbeats and triggers when values jump
above thresholds.
Akka Cluster is built on top of Akka Remote, but the configuration
generates some ambiguities :

   - default settings for akka.cluster.failure-detector will trigger
   *PhiAccrualFailureDetector* if there are no heartbeats within *5.5s*
   - default settings for akka.remote.watch-failure-detector will trigger
   *PhiAccrualFailureDetector* if there are no heartbeats within* 12.5s*

moreover

   - akka.cluster.failure-detector is used by the cluster subsystem to
   detect unreachable members.
   - akka.remote.watch-failure-detector is used for remote death watch.


*Q1* : when using akka cluster, if a node goes down( Ctrl+Z, GC, netork
failure etc), which PhiAccrualFailureDetector is trigger and when ?

*Q2* : I've enabled logs ad debug level but cannot see any of these
mentioned, the only one I can see is

16:47:22.978 [activity-feeds-akka.actor.default-dispatcher-16] INFO
 a.r.transport.ProtocolStateActor - No response from remote. Transport
failure detector triggered. (internal state was Open)
16:47:23.109 [activity-feeds-akka.actor.default-dispatcher-6] INFO
 a.r.transport.ProtocolStateActor - No response from remote. Transport
failure detector triggered. (internal state was Open)

which seems to be akka.remote.transport-failure-detector.

Can anyone please help me tuning the configuration correctly ?

Cheers,
Francesco

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] akka-cluster - generic failure detector configuration(S)

Reply via email to