Re: [akka-user] Actor systems in the cluster are quarantined too often

Patrik Nordwall Sat, 29 Oct 2016 00:14:00 -0700

How long network partitions do you have?
You have increased acceptable-heartbeat-pause of the cluster failure
detector, which is good.
You use auto-down-unreachable-after. In total those timeouts would mean
that it should be able to survive a network partition of 70 seconds before
removing and quarantining the unreachable members. Do you see quarantine
earlier than that?


Regards,
Patrik

On Sat, Oct 29, 2016 at 1:59 AM, Eugene Dzhurinsky <[email protected]>
wrote:

> I have some not really stable network across different geo locations
> (different hemispheres actually) and from time to time my actor cluster
> falls apart.
> I wrote bunch of event interceptors for quarantine events and the system
> is more or less operable (less than 10% of nodes are off-cluster at any
> given mode, they can detect the failures and quarantine events and restart
> themselves), and task recovery is performed really well. But recently I
> started to observe a lots of restarts for the systems. Is there any way to
> make the cluster more stable? Below is my config:
>
> akka {
>
>   actor {
>     provider = "akka.cluster.ClusterActorRefProvider"
>   }
>
>   remote {
>     log-remote-lifecycle-events = on
>     log-sent-messages = off
>     log-received-messages = off
>
>     netty.tcp {
>       hostname = "127.0.0.1"
>       port = 0
>     }
>
>     watch-failure-detector {
>       threshold = 12
>       heartbeat-interval = 10 s
>       acceptable-heartbeat-pause = 60 s
>     }
>
>   }
>
>    cluster {
>     seed-nodes = ["akka.tcp://[email protected]:12551"]
>     auto-down-unreachable-after = 10s
>
>     failure-detector {
>       threshold = 12
>       acceptable-heartbeat-pause = 60 s
>       heartbeat-interval = 10 s
>     }
>
>     use-dispatcher = cluster-dispatcher
>
>   }
>
>   contrib.cluster.pub-sub {
>
>     name = distributedPubSubMediator
>     role = ""
>     routing-logic = broadcast
>     gossip-interval = 1s
>     removed-time-to-live = 120s
>     max-delta-elements = 3000
>
>   }
>
>   loggers = ["akka.event.slf4j.Slf4jLogger"]
>   loglevel = "DEBUG"
>
> }
>
>
> singleton-dispatcher {
>   fork-join-executor.parallelism-min = 1
>   fork-join-executor.parallelism-max = 1
> }
>
>
> cluster-dispatcher {
>   type = "Dispatcher"
>   executor = "fork-join-executor"
>   fork-join-executor {
>     parallelism-min = 2
>     parallelism-max = 4
>   }
> }
>
> Thanks!
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Akka Tech Lead
Lightbend <http://www.lightbend.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Actor systems in the cluster are quarantined too often

Reply via email to