[akka-user] Question about multi data center behavior when a data center is empty

lemmsjid Sun, 04 Feb 2018 14:57:59 -0800

Hi everyone, I have a question pertaining to the multi data center 
behavior.  In a nutshell, I'm wondering what the expected behavior is if 
all of the nodes (especially the last one) in a data center go down, and 
how the other data centers would respond to that.


In my experimenting, when the last node goes down in a data center 
(gracefully or not), and there are nodes remaining in other data centers, 
they are unable to remove that node from the system.  And if that node is 
restarted, it cannot rejoin.  I'm wondering if that is expected behavior or 
if I've done something wrong.

I've created a toy example that illustrates the behavior I'm encountering 
(on the latest Akka release).

Node "A" exists in data center "DC_A"
Node "B" exists in data center "DC_B"
Node "C" does not yet exist in data center "DC_B" (e.g. I haven't turned it 
on yet)

Auto-downing is off, but I have an API setup where I can tell any given 
Node to tell any other Node to either Down or Leave the cluster.

   1. B gracefully exits the cluster.  
   2. DC_B is now empty
   3. A sees that B is in "Exiting" state.  B is never actually downed--it 
   exists unto perpetuity.
   4. A keeps trying to reconnect to B, so I want to clean up its state.  
   But if I tell A to down B, B continues to exist in the Exiting state (as 
   far as A's view of the Cluster goes)  
   5. If I start B up again, it cannot rejoin: "A" will spam its logs with 
   "New incarnation of existing member... is trying to join. Existing member 
   will be removed from the cluster and then new member will be allowed to 
   join."
   6. But the status quo continues.  A keeps trying to reconnect to B, but 
   B exists forever in the Exiting state.  I continue telling A to down B 
   manually (having verified that my code is correct for that), but nothing 
   happens.
   7. The only way I can figure out how to resolve the situation is to 
   start a new node in B's data center.  So now, I start up C, which joins 
   DC_B as its only member.  Eventually C figures out that B should be downed 
   and downs it, then re-allows it to enter the cluster.

The above happens in the same way if B does not gracefully exit, e.g. I 
give it a kill -9.  

The above works fine if I have A and B in the same data center.  A marks B 
as down and it leaves the cluster.

I suppose my question is: is this expected behavior?  I've read the multi 
data center docs and can't get a handle on the expected behavior in that 
situation.  On one hand I understand that the point of the multi-datacenter 
functionality is to keep them partitioned, so obviously A wouldn't do 
anything automatically against B, or anything else outside of its data 
center.  But shouldn't I be able to tell nodes outside of a data center to 
down all the nodes in another data center, seeing as I know they're 
permanently down?  Perhaps there is some other way of cleaning up the 
cluster state that I'm not aware of.  


-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Question about multi data center behavior when a data center is empty

Reply via email to