This is happening again a lot for me, and on 2.3.11. I am running jmeter-tests, so I am pretty sure load is the trigger. Also when I deploy other applications on the machines, the cluster sometimes gets into this state.
Is there some debug logging I can turn on to investigate this? It seems the cluster gossip does not converge... Anders tirsdag 9. juni 2015 10.09.22 UTC+2 skrev Anders Båtstrand følgende: > > I thought I was using 2.3.11, but this system was still on 2.3.10. I have > upgraded, and will see if it happens again. > > I meant the nodes were downed, yes. > > Thank you for the bug pointer, that might be a way to trigger it! > > Anders > > søn. 7. jun. 2015, 12.29 skrev Akka Team <[email protected]>: > > Hi Anders, > > On Tue, Jun 2, 2015 at 3:15 PM, Anders Båtstrand <[email protected]> > wrote: > > I now encountered the problem again: The cluster (3 nodes) suddenly has > two leaders, and only one of the nodes reported all the other nodes to be > part of the cluster. > > While it might have been triggered by high CPU, I am not sure why it did > not self-heal. Should not the gossip converge? > > There are two things here. First, nodes just mark other nodes as > UNREACHABLE. This is a fully recoverable operation. DOWNING means that the > node has been removed and cannot come back until it has been restarted. > When you say only one of the nodes reported the other nodes to be part of > the cluster did you mean that the other nodes have seen this UNREACHABLE, > or have they downed it? > > > > When I checked the system, all applications were running fine, with almost > no load. > > What I don't understand is the following: > > If one node reports another node to be up, how can it be possible that the > other node reports the first node to be down (I am using auto-down)? > > Hmm, this reminds me of an older ticket: > https://github.com/akka/akka/issues/16624 > > Which version of Akka are you using? Does this happen with 2.3.11? > > -Endre > > > > > Best regards, > > Anders > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > > > You received this message because you are subscribed to the Google > Groups "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > > > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Akka Team > > Typesafe - Reactive apps on the JVM > > Blog: letitcrash.com > Twitter: @akkateam > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to a topic in the > Google Groups "Akka User List" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/akka-user/7lZ_0Ukdeyo/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
