[akka-user] Re: Akka cluster failover- race condition

tomerneeraj Wed, 18 Nov 2015 06:30:02 -0800

Hi,

Thanks for the pointer Johan. I will have a look on it!!


Regards
Neeraj

On Monday, November 16, 2015 at 4:13:15 PM UTC+5:30, Johan Andrén wrote:
>
> Hi,
>
> What you describe is what is called split brain, where one cluster becomes 
> two clusters 
> that does not know of each other because of communication failure, it can 
> happen when
> you use the built in auto downing in akka-cluster. Basically there are two 
> ways around it.
>
> One is to not use auto downing but instead let ops or an external tool 
> monitor your cluster 
> and let it/them manually down a cluster node when you are certain it is 
> actually down.
>
> Another is to implement a more intelligent strategy for auto downing, this 
> is available as a 
> commercial feature from Typesafe called the Split Brain Resolver (SBR). 
> You can read more
> about it here: 
> http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html
>
> Hope this helps!
> --
> Johan Andrén
> Typesafe -  Reactive apps on the JVM
> Twitter: @apnylle
>
>
>
> On Thursday, November 12, 2015 at 12:28:15 PM UTC+1, tomerneeraj wrote:
>>
>> Hi, 
>>
>> Node here means one VM.
>>
>> We are using Akka cluster where each node in the cluster assigned to do 
>> specific task. If any node in the cluster goes down then MEMBER DOWN event 
>> comes up in the cluster. After catching this event other node start 
>> processing task assigned to the failure node 
>>
>> This is where problem pops up. Other node shows down in the cluster 
>> because it does not provide response for cluster events and timeout occurs 
>> I.e Akka cluster consider it down whereas due to high load or GC events it 
>> does not provide response but actually it keeps on processing records at 
>> slow rate 
>>
>> Now both node in cluster 1. Show down in Akka cluster event 2. New node 
>> which starts processing due to member down event, starts processing same 
>> set of records and hence race conditions starts occurring 
>>
>> One way to think around it is that never let the node overloaded and in 
>> that case this event always comes up when the node is actually down and not 
>> shows as down due to response down for checking the availability of node in 
>> the cluster. 
>>
>> But there are other scenarios also which can not be predicted in advance. 
>> We need to have some mechanism where it guarantees that if some node is 
>> down that is down in reality
>>
>> Need expert group members advise on it how to resolve it or it needs to 
>> be looked in a different way
>>
>> Regards
>> Neeraj
>>
>>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Akka cluster failover- race condition

Reply via email to