> 11 sep 2014 kl. 19:16 skrev Joe Wong <[email protected]>: > > Hi Patrik > > Thanks for the response. > > The downing of the nodes is done on the command line by calling the wrapper > which I believe just kills the process.
I was referring (in an unclear way) to downing of the akka cluster member, e.g. with the auto-downing setting. If you had disabled that and don't do downing programmatically or manually (jmx) the cluster will not remove the member and you will not be able to join a node with same host:port. > > I did add #auto-down-unreachable-after = 10s because we were having issues > with nodes leaving the cluster but that issue was our bad. However, from time > to time we see in the logs a worker node is marked UNREACHABLE and after 10 > seconds or so it's marked as REACHABLE so I left the config in thinking it we > keep the workers in the cluster. It can be a long GC or overload or whatever that triggers the failure detector. > > So maybe I should try it out setting it to auto-down-unreachable-after = 20s > and try it out. I'll have to find a time slot to do it as the issue occurs in > the production environment and not in the staging environment. > Yes, try that. /Patrik > Regards, > >> On Tuesday, September 9, 2014 12:50:01 AM UTC-7, Patrik Nordwall wrote: >> Hi Joe, >> >> How do you perform downing of the nodes? It will not be possible to join >> node with same host:port until the previous member with same host:port has >> been removed from the cluster. >> I noticed this in your config: #auto-down-unreachable-after = 10s >> >> Regards, >> Patrik >> >>> On Tue, Sep 9, 2014 at 1:16 AM, Joe Wong <[email protected]> wrote: >>> Hi Martynas, >>> >>> Thanks for the response. I checked the setting and can confirm they do not >>> share the same hostname, port, and seed nodes. >>> >>> I was wondering can we force the Cluster to allow a node to rejoin? >>> >>> Regards, >>> >>> >>> >>>> On Saturday, September 6, 2014 2:09:26 AM UTC-7, Martynas Mickevičius >>>> wrote: >>>> Hi Joe, >>>> >>>> your configuration seems correct and I tried to run a small example with >>>> it and it works as expected. >>>> >>>> Are you sure you do not share hostname, port and seed-nodes configuration >>>> between your staging and production environments? My guess would be that >>>> ActorSystem from staging interfere with an ActorSystem from production. I >>>> know its a long shot, but worth checking. >>>> >>>> >>>>> On Thu, Sep 4, 2014 at 8:21 PM, Joe Wong <[email protected]> wrote: >>>>> Hi all, >>>>> >>>>> We are using Akka cluster where we have 2 types of nodes, master and >>>>> worker. There are 2 master nodes, both are also seed nodes, and the >>>>> actors for those nodes are cluster singletons. There are 8 worker nodes. >>>>> All process are started and stopped with Wrapper (Version 3.2.3) >>>>> http://wrapper.tanukisoftware.org and each node is on it's own virtual >>>>> host. >>>>> >>>>> The issue we are noticing is if we stop and start the worker the cluster >>>>> will ignore it's attempt to rejoin. The log message is: >>>>> 2014-09-03 22:36:35,107 INFO >>>>> [ClusterSystem-akka.actor.default-dispatcher-3] Cluster Node >>>>> [akka.tcp://blah blah blah] - Existing member >>>>> [UniqueAddress(akka.tcp://blah blah blah)] is trying to join, ignoring >>>>> >>>>> We tried waiting for a while before restarting the worker but it didn't >>>>> solve the issue. This does't happen in our staging environment which has >>>>> 2 workers. This points to a configuration setting between the 2 >>>>> environments but I have double checked them and their identical other >>>>> than the ip addresses and cluster name. >>>>> >>>>> Interestingly, once we stop a worker our production logs do show the >>>>> cluster constantly repeating the gated message every 10 seconds or so. >>>>> 2014-09-03 22:36:11,130 WARN >>>>> [ClusterSystem-akka.actor.default-dispatcher-2] Association with remote >>>>> system [akka.tcp://blah blah blah] has failed, address is now gated for >>>>> [5000] ms. Reason is: [Association failed with [akka.tcp://blah blah >>>>> blah]]. >>>>> >>>>> There's another issue that maybe related and it only happens in our >>>>> production environment. The issue is if we shut the "active" master >>>>> process down the 2nd master actor does not start up. The log files do >>>>> show the cluster has detected that the "active" master is no longer >>>>> responding. >>>>> >>>>> Below are the configurations for both Master and Worker. >>>>> >>>>> Any ideas? thanks. >>>>> >>>>> Regards, >>>>> >>>>> **** MASTER config **** >>>>> akka { >>>>> actor { >>>>> provider = "akka.cluster.ClusterActorRefProvider" >>>>> debug{ >>>>> autoreceive = off >>>>> lifecycle = off >>>>> event-stream = off >>>>> } >>>>> } >>>>> >>>>> cluster-dispatcher{ >>>>> type = "Dispatcher" >>>>> executor = "fork-join-executor" >>>>> fork-join-executor{ >>>>> parallelism-min = 2 >>>>> parallelism-max = 4 >>>>> } >>>>> } >>>>> >>>>> remote { >>>>> log-remote-lifecycle-events = off >>>>> log-reveived-message = off >>>>> netty.tcp { >>>>> hostname = "10.6.206.154" >>>>> port = 40000 >>>>> } >>>>> } >>>>> >>>>> cluster { >>>>> seed-nodes = [ >>>>> "akka.tcp://[email protected]:40000", >>>>> "akka.tcp://[email protected]:40000" >>>>> ] >>>>> >>>>> roles=["MASTER", "SCHEDULER"] >>>>> retry-unsuccessful-join-after = 5s >>>>> >>>>> auto-down-unreachable-after = 10s >>>>> #unreachable-nodes-reaper-interval = 1s >>>>> >>>>> failure-detector{ >>>>> #heartbeat-interval=1s >>>>> threshold = 12.0 >>>>> #acceptable-heartbeat-pause=2s >>>>> #expected-response-after=2s >>>>> } >>>>> >>>>> use-dispatcher = akka.cluster-dispatcher >>>>> >>>>> } >>>>> >>>>> loggers = ["akka.event.slf4j.Slf4jLogger"] >>>>> # Options: OFF, ERROR, WARNING, INFO, DEBUG >>>>> loglevel = "DEBUG" >>>>> log-config-on-start = off >>>>> >>>>> } >>>>> >>>>> **** WORKER config**** >>>>> akka { >>>>> actor { >>>>> provider = "akka.cluster.ClusterActorRefProvider" >>>>> debug{ >>>>> autoreceive = off >>>>> lifecycle = off >>>>> event-stream = off >>>>> } >>>>> } >>>>> >>>>> cluster-dispatcher{ >>>>> type = "Dispatcher" >>>>> executor = "fork-join-executor" >>>>> fork-join-executor{ >>>>> parallelism-min = 2 >>>>> parallelism-max = 4 >>>>> } >>>>> } >>>>> >>>>> remote { >>>>> log-remote-lifecycle-events = off >>>>> log-reveived-message = off >>>>> netty.tcp { >>>>> hostname = "10.6.206.136" >>>>> port = 45000 >>>>> } >>>>> } >>>>> >>>>> cluster { >>>>> seed-nodes = [ >>>>> "akka.tcp://[email protected]:40000", >>>>> "akka.tcp://[email protected]:40000"] >>>>> >>>>> roles=["WORKER"] >>>>> retry-unsuccessful-join-after = 5s >>>>> #disable auto-down - worker should never leave the cluster >>>>> #auto-down-unreachable-after = 10s >>>>> >>>>> use-dispatcher = akka.cluster-dispatcher >>>>> } >>>>> >>>>> loggers = ["akka.event.slf4j.Slf4jLogger"] >>>>> # Options: OFF, ERROR, WARNING, INFO, DEBUG >>>>> loglevel = "INFO" >>>>> log-config-on-start = off >>>>> >>>>> } >>>>> >>>>> >>>>> -- >>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>> >>>>>>>>>> Check the FAQ: >>>>> >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >>>>> --- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Akka User List" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >>>> -- >>>> Martynas Mickevičius >>>> Typesafe – Reactive Apps on the JVM >>> >>> -- >>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>> >>>>>>>>>> Check the FAQ: >>> >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "Akka User List" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/akka-user. >>> For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> Patrik Nordwall >> Typesafe - Reactive apps on the JVM >> Twitter: @patriknw >> > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
