You should probably also look into why they are quarantined. It can be two reasons:
1) The nodes are removed from the cluster, which will happen if failure detection triggers, you use auto-downing and they don't become reachable again within the configured akka.cluster.auto-down-unreachable-after timeout. You might want to increase the auto-down timeout? 2) Overflow of the system message delivery buffer, because of many remote watch or remote deployments. You might want to increase the akka.remote.system-message-buffer-size, or adjust your design? Cheers, Patrik On Fri, Feb 6, 2015 at 10:58 AM, Akka Team <[email protected]> wrote: > Hi Mark, > > On Tue, Feb 3, 2015 at 5:13 PM, Mark Kegel <[email protected]> wrote: > >> We are using akka 2.3.4, but I don't think this is an issue with a >> specific version of akka. In fact the docs explicitly state that you have >> to restart the akka node after its been Quarantined. >> >> I'm looking for some way to detect that my node has been quarantined so >> that I can force an exit, so that our puppet system can restart it, or just >> restart the akka system programmatically without exiting the process. This >> seems like basic error handling and recovery but I see nothing in the docs >> on how a person is supposed to handle this, or how they can even be >> notified of the issue. >> > > I agree that we can improve the documentation around this. The remoting > publishes events that you can subscribe to: > > http://doc.akka.io/docs/akka/2.3.9/scala/remoting.html#Remote_Events > > One of those published events notifies of quarantine: > http://doc.akka.io/api/akka/2.3.9/#akka.remote.QuarantinedEvent > > -Endre > > >> Is there any kind of exception that bubbles back to user code, or a >> cluster state message that I can receive, for when my local akka instance >> can't rejoin the cluster? >> >> Is there any way a supervisor hierarchy can help solve this problem? >> >> If someone can point me to code that is able to respond and recover from >> such failures intelligently, and using akka approved idioms, that would be >> most appreciated. >> > > >> Mark >> >> >> >> On Tuesday, February 3, 2015 at 6:32:20 AM UTC-6, Patrik Nordwall wrote: >>> >>> What version of Akka are you using? We fixed some issue related to >>> quarantining in 2.3.9. >>> /Patrik >>> >>> On Mon, Jan 26, 2015 at 5:20 PM, Mark Kegel <[email protected]> wrote: >>> >>>> We are using akka in a clustered configuration at work. Its a very >>>> simple cluster with just three node types: an admin node, "live" nodes, and >>>> "preview" nodes. The admin node will manage nodes of the other two types, >>>> and ask for things like status and uptime. Every so often one of the >>>> live/preview nodes will become unresponsive to requests from the admin >>>> node. The only way we've been able to fix this is to restart the node. >>>> >>>> From reading the akka docs this seems to correspond to the node >>>> becoming Quarantined. While I appreciate that this state is necessary to >>>> maintain consistency, I'm at a loss in finding docs that show how to >>>> respond in code when this happens. On our admin node we'll know that some >>>> other live/preview node has failed and will require a restart, but what >>>> would work best is if we could have a service watching locally on the >>>> failed live/preview node that could force a restart of that nodes' JVM. >>>> >>>> Is there any kind of exception that bubbles back to user code, or a >>>> cluster state message that I can receive, for when my local akka instance >>>> can't rejoin the cluster? >>>> >>>> Is there any way a supervisor hierarchy can help solve this problem? >>>> >>>> If someone can point me to code that is able to respond and recover >>>> from such failures intelligently, and using akka approved idioms, that >>>> would be most appreciated. >>>> >>>> Mark >>>> >>>> -- >>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/ >>>> current/additional/faq.html >>>> >>>>>>>>>> Search the archives: https://groups.google.com/ >>>> group/akka-user >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Akka User List" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/akka-user. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> >>> Patrik Nordwall >>> Typesafe <http://typesafe.com/> - Reactive apps on the JVM >>> Twitter: @patriknw >>> >>> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: >> http://doc.akka.io/docs/akka/current/additional/faq.html >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to the Google Groups >> "Akka User List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Akka Team > Typesafe - The software stack for applications that scale > Blog: letitcrash.com > Twitter: @akkateam > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- Patrik Nordwall Typesafe <http://typesafe.com/> - Reactive apps on the JVM Twitter: @patriknw -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
