Re: [akka-user] Akka Cluster 2.3.4 nodes unable to reconnect to cluster

danny . browning Thu, 18 Sep 2014 04:39:26 -0700

If you're killing the process, you should send a terminate and use the 
shutdown hook.


sys.addShutdownHook( {
  val cluster = Cluster(system)
  cluster.down(cluster.selfAddress)
  cluster.leave(cluster.selfAddress)
  system.shutdown()
} )

On Thursday, September 11, 2014 11:16:18 AM UTC-6, Joe Wong wrote:
>
> Hi Patrik
>
> Thanks for the response.
>
> The downing of the nodes is done on the command line by calling the 
> wrapper which I believe just kills the process. 
>
> I did add  #auto-down-unreachable-after = 10s because we were having 
> issues with nodes leaving the cluster but that issue was our bad. However, 
> from time to time we see in the logs a worker node is marked UNREACHABLE 
> and after 10 seconds or so it's marked as REACHABLE so I left the config in 
> thinking it we keep the workers in the cluster. 
>
> So maybe I should try it out setting it to auto-down-unreachable-after = 
> 20s and try it out. I'll have to find a time slot to do it as the issue 
> occurs in the production environment and not in the staging environment.
>
> Regards,
>
> On Tuesday, September 9, 2014 12:50:01 AM UTC-7, Patrik Nordwall wrote:
>>
>> Hi Joe,
>>
>> How do you perform downing of the nodes? It will not be possible to join 
>> node with same host:port until the previous member with same host:port has 
>> been removed from the cluster.
>> I noticed this in your config: #auto-down-unreachable-after = 10s
>>
>> Regards,
>> Patrik
>>
>> On Tue, Sep 9, 2014 at 1:16 AM, Joe Wong <[email protected]> wrote:
>>
>>> Hi Martynas,
>>>
>>> Thanks for the response. I checked the setting and can confirm they do 
>>> not share the same hostname, port, and seed nodes. 
>>>
>>> I was wondering can we force the Cluster to allow a node to rejoin? 
>>>
>>> Regards,
>>>
>>>
>>>
>>> On Saturday, September 6, 2014 2:09:26 AM UTC-7, Martynas Mickevičius 
>>> wrote:
>>>>
>>>> Hi Joe,
>>>>
>>>> your configuration seems correct and I tried to run a small example 
>>>> with it and it works as expected.
>>>>
>>>> Are you sure you do not share hostname, port and seed-nodes 
>>>> configuration between your staging and production environments? My guess 
>>>> would be that ActorSystem from staging interfere with an ActorSystem from 
>>>> production. I know its a long shot, but worth checking.
>>>>
>>>>
>>>> On Thu, Sep 4, 2014 at 8:21 PM, Joe Wong <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are using Akka cluster where we have 2 types of nodes, master and 
>>>>> worker.  There are 2 master nodes, both are also seed nodes, and the 
>>>>> actors 
>>>>> for those nodes are cluster singletons. There are 8 worker nodes. All 
>>>>> process are started and stopped with Wrapper (Version 3.2.3) 
>>>>> http://wrapper.tanukisoftware.org and each node is on it's own 
>>>>> virtual host.
>>>>>
>>>>> The issue we are noticing is if we stop and start the worker the 
>>>>> cluster will ignore it's attempt to rejoin. The log message is:
>>>>> 2014-09-03 22:36:35,107  INFO 
>>>>> [ClusterSystem-akka.actor.default-dispatcher-3] 
>>>>> Cluster Node [akka.tcp://blah blah blah] - Existing member 
>>>>> [UniqueAddress(akka.tcp://blah blah blah)] is trying to join, ignoring
>>>>>
>>>>> We tried waiting for a while before restarting the worker but it 
>>>>> didn't solve the issue.  This does't happen in our staging environment 
>>>>> which has 2 workers. This points to a configuration setting between the 2 
>>>>> environments but I have double checked them and their identical other 
>>>>> than 
>>>>> the ip addresses and cluster name.
>>>>>
>>>>> Interestingly, once we stop a worker our production logs do show the 
>>>>> cluster constantly repeating the gated message every 10 seconds or so.
>>>>> 2014-09-03 22:36:11,130  WARN 
>>>>> [ClusterSystem-akka.actor.default-dispatcher-2] 
>>>>> Association with remote system [akka.tcp://blah blah blah] has failed, 
>>>>> address is now gated for [5000] ms. Reason is: [Association failed with 
>>>>> [akka.tcp://blah blah blah]].
>>>>>
>>>>> There's another issue that maybe related and it only happens in our 
>>>>> production environment. The issue is if we shut the "active" master 
>>>>> process 
>>>>> down the 2nd master actor does not start up. The log files do show the 
>>>>> cluster has detected that the "active" master is no longer responding.
>>>>>
>>>>> Below are the configurations for both Master and Worker.
>>>>>
>>>>> Any ideas? thanks.
>>>>>
>>>>> Regards,
>>>>>
>>>>> **** MASTER config ****
>>>>> akka {
>>>>>   actor {
>>>>>     provider = "akka.cluster.ClusterActorRefProvider"
>>>>>     debug{
>>>>>       autoreceive = off
>>>>>       lifecycle = off
>>>>>       event-stream = off
>>>>>     }
>>>>>   }
>>>>>
>>>>>   cluster-dispatcher{
>>>>>    type = "Dispatcher"
>>>>>    executor = "fork-join-executor"
>>>>>    fork-join-executor{
>>>>>      parallelism-min = 2
>>>>>      parallelism-max = 4
>>>>>    }
>>>>>   }
>>>>>
>>>>>   remote {
>>>>>     log-remote-lifecycle-events = off
>>>>>     log-reveived-message = off
>>>>>     netty.tcp {
>>>>>       hostname = "10.6.206.154"
>>>>>       port = 40000
>>>>>     }
>>>>>   }
>>>>>
>>>>>   cluster {
>>>>>     seed-nodes = [
>>>>>       "akka.tcp://[email protected]:40000",
>>>>>       "akka.tcp://[email protected]:40000"
>>>>>     ]
>>>>>  
>>>>>     roles=["MASTER", "SCHEDULER"]
>>>>>     retry-unsuccessful-join-after = 5s
>>>>>
>>>>>     auto-down-unreachable-after = 10s
>>>>>     #unreachable-nodes-reaper-interval = 1s
>>>>>
>>>>>     failure-detector{
>>>>>       #heartbeat-interval=1s
>>>>>       threshold = 12.0
>>>>>       #acceptable-heartbeat-pause=2s
>>>>>       #expected-response-after=2s
>>>>>     }
>>>>>
>>>>>     use-dispatcher = akka.cluster-dispatcher
>>>>>
>>>>>   }
>>>>>
>>>>>   loggers = ["akka.event.slf4j.Slf4jLogger"]
>>>>>   # Options: OFF, ERROR, WARNING, INFO, DEBUG
>>>>>   loglevel = "DEBUG"
>>>>>   log-config-on-start = off
>>>>>   
>>>>> }
>>>>>
>>>>> **** WORKER config****
>>>>> akka {
>>>>>   actor {
>>>>>     provider = "akka.cluster.ClusterActorRefProvider"
>>>>>     debug{
>>>>>       autoreceive = off
>>>>>       lifecycle = off
>>>>>       event-stream = off
>>>>>     }
>>>>>   }
>>>>>
>>>>>   cluster-dispatcher{
>>>>>     type = "Dispatcher"
>>>>>     executor = "fork-join-executor"
>>>>>     fork-join-executor{
>>>>>       parallelism-min = 2
>>>>>       parallelism-max = 4
>>>>>     }
>>>>>   }
>>>>>
>>>>>   remote {
>>>>>     log-remote-lifecycle-events = off
>>>>>     log-reveived-message = off
>>>>>     netty.tcp {
>>>>>       hostname = "10.6.206.136"
>>>>>       port = 45000
>>>>>     }
>>>>>   }
>>>>>
>>>>>   cluster {
>>>>>     seed-nodes = [
>>>>>       "akka.tcp://[email protected]:40000",
>>>>>       "akka.tcp://[email protected]:40000"]
>>>>>
>>>>>     roles=["WORKER"]
>>>>>     retry-unsuccessful-join-after = 5s
>>>>>     #disable auto-down - worker should never leave the cluster
>>>>>     #auto-down-unreachable-after = 10s
>>>>>
>>>>>     use-dispatcher = akka.cluster-dispatcher
>>>>>   }
>>>>>
>>>>>   loggers = ["akka.event.slf4j.Slf4jLogger"]
>>>>>   # Options: OFF, ERROR, WARNING, INFO, DEBUG
>>>>>   loglevel = "INFO"
>>>>>   log-config-on-start = off
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>>  -- 
>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
>>>>> current/additional/faq.html
>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/
>>>>> group/akka-user
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Akka User List" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/akka-user.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Martynas Mickevičius
>>>> Typesafe <http://typesafe.com/> – Reactive 
>>>> <http://www.reactivemanifesto.org/> Apps on the JVM
>>>>  
>>>  -- 
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: 
>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>> >>>>>>>>>> Search the archives: 
>>> https://groups.google.com/group/akka-user
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>>
>> Patrik Nordwall
>> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
>> Twitter: @patriknw
>>
>> 

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Akka Cluster 2.3.4 nodes unable to reconnect to cluster

Reply via email to