Re: [akka-user] Removing programmatically and dynamically Node from Cluster

Justin du coeur Wed, 13 Sep 2017 12:53:28 -0700

If I'm understanding you correctly, that's not really any better than the
broken auto-down system built into Scala.  You really need something
smarter for production.


*Conceptually*, there is a very straightforward strategy: when a node sees
nodes become unreachable, it checks whether it can see more than half of
the nodes it expects.  If it can, it assumes that things are otherwise
okay, and marks the unreachable node as down; if not, then it assumes that
it is on the "losing" side of a network partition, and self-destructs.

That's easy to explain, but implementing all the details properly is
non-trivial.  (My own implementation
<https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/querki/cluster/QuerkiNodeManager.scala>
has been in production for a while, but I suspect still has problems with
some edge cases, and I plan to replace it with something more AWS-aware.)
 You want some delay before doing this, so that brief network hiccups don't
cause your nodes to self-destruct, and figuring out the correct definition
of "half" can be complex if the network isn't fixed-size.  But something
along those lines is pretty necessary if you want a decently stable cluster
for production...

On Wed, Sep 13, 2017 at 2:54 PM, Sebastian Oliveri <[email protected]>
wrote:

> Hi,
>
> I have a cluster with a few nodes running clustered sharding persistent
> actors that I am close to deploy in prod
> I tested that once a node is unreachable all the persistent actors inside
> it are unreachable as well until human intervention takes places to Down
> that unreachable node for the cluster to restore those actors in other
> nodes.
> I am not considering the commercial split brain strategies so I have no
> other option than doing it manual.
> The worst case would be for the human intervention to take long and so for
> clients to be unable to interact with the actors in a considerable time
> window.
> I was thinking about having an actor in each node just like the
> SimpleClusterListener written in the akka cluster docs and somehow to
> mark as Down the unreachable node when handling the command:
> UnreachableMember using the akka management API through HTTP (no matter
> if it was network partition or crash)
> I don´t know if this is possible because I didn't read as a solution so
> far. The worst case would be that if it was actually a network partition
> that will be eventually restore the unreachable node will be Down and
> removed from the cluster and all its persistent actors (maybe thousands,
> millions) would be restored again in other nodes, am I right?
>
>   def receive = {    case UnreachableMember(member) =>
>       akkaManagement.markAsDown(member.address) // something like this  }
>
>
> Is this OK as a solution to avoid human intervention?
>
> Thanks,
> Sebastian
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Removing programmatically and dynamically Node from Cluster

Reply via email to