[akka-user] Re: Akka cluster failover- race condition

Johan Andrén Mon, 16 Nov 2015 02:43:59 -0800

Hi,

What you describe is what is called split brain, where one cluster becomes 
two clusters 
that does not know of each other because of communication failure, it can 
happen when
you use the built in auto downing in akka-cluster. Basically there are two 
ways around it.


One is to not use auto downing but instead let ops or an external tool 
monitor your cluster 
and let it/them manually down a cluster node when you are certain it is 
actually down.

Another is to implement a more intelligent strategy for auto downing, this 
is available as a 
commercial feature from Typesafe called the Split Brain Resolver (SBR). You 
can read more
about it 
here: http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html

Hope this helps!
--
Johan Andrén
Typesafe -  Reactive apps on the JVM
Twitter: @apnylle



On Thursday, November 12, 2015 at 12:28:15 PM UTC+1, tomerneeraj wrote:
>
> Hi, 
>
> Node here means one VM.
>
> We are using Akka cluster where each node in the cluster assigned to do 
> specific task. If any node in the cluster goes down then MEMBER DOWN event 
> comes up in the cluster. After catching this event other node start 
> processing task assigned to the failure node 
>
> This is where problem pops up. Other node shows down in the cluster 
> because it does not provide response for cluster events and timeout occurs 
> I.e Akka cluster consider it down whereas due to high load or GC events it 
> does not provide response but actually it keeps on processing records at 
> slow rate 
>
> Now both node in cluster 1. Show down in Akka cluster event 2. New node 
> which starts processing due to member down event, starts processing same 
> set of records and hence race conditions starts occurring 
>
> One way to think around it is that never let the node overloaded and in 
> that case this event always comes up when the node is actually down and not 
> shows as down due to response down for checking the availability of node in 
> the cluster. 
>
> But there are other scenarios also which can not be predicted in advance. 
> We need to have some mechanism where it guarantees that if some node is 
> down that is down in reality
>
> Need expert group members advise on it how to resolve it or it needs to be 
> looked in a different way
>
> Regards
> Neeraj
>
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Akka cluster failover- race condition

Reply via email to