Thanks, that makes sense. I have to read more about the gossip protocol :)

I think I will use manual intervention by setting up an AWS service to 
monitor nodes.

I could implement a simple SPR but there would be many cases I don´t know 
how to solve at the moment:

1) If I run 2 nodes with dynamic IPs and there is a network partition the 
Split Brain Resolver is not in a position to determine what to do
2) To implement a keep majority algorithm I think I would need to run an 
odd number of nodes and the SPR would know what nodes to Down based on a 
threshold
3) Even with an odd number of nodes (like 3) what if all of them become 
unreachable between each other, same scenario as 1 and the the SPR wouldn't 
know what to do.

Sounds like a difficult mess to implement!

I wonder if the commercial package solve those cases and how? 

Thanks Justin!

Sebastian.



El jueves, 14 de septiembre de 2017, 9:39:29 (UTC-3), Justin du coeur 
escribió:
>
> On Wed, Sep 13, 2017 at 5:55 PM, Sebastian Oliveri <[email protected] 
> <javascript:>> wrote:
>
>> Am I in the right direction? I was thinking more in a server that crashes 
>> more than a vertical network partition affecting many nodes...
>>
>
> The problem is, how do you tell the difference?  Specifically, when you 
> get a network partition, it *looks* to each node like the other one(s) have 
> crashed.  So if they then down each other, you have split-brain.
>
> That's the key issue: from the *outside*, it's usually impossible to tell 
> the difference between a dead node and a network partition.  If the node is 
> dead, then sure, you want to down it.  But if it's a network partition, you 
> must *not* down it unless it has matching logic that causes it to 
> deliberately crash *itself*.  Without that, you're likely to get 
> split-brain.
>
> I may be misunderstanding you, but keep in mind that all the infamous 
> auto-down does is detect an Unreachable member, *wait* a few seconds, and 
> then down it.  It sounds like you're suggesting doing the same thing 
> without the wait, but the results will be the same...
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to