Re: [akka-user] ClusterSingletonManager singleton actors not failing over (2 node cluster)

Patrik Nordwall Tue, 01 Dec 2015 10:52:34 -0800

Have you tried a stable version? Latest is 2.4.1
/Patrik
ons 25 nov. 2015 kl. 17:02 skrev Alex Piggott <[email protected]>:


>
> I couldn't find anything like this on the mailing list or github
>
> I have a 2 node cluster running 2.4-M1 (scala 2.11). I use several
> singleton actors (their code is unimportant as we'll see)
>
> When I start both nodes at the same time, the singletons appear in the
> first node to launch, as expected. (I have a log in the actor constructors.)
>
> (I also have a junit test where i launch 2 processes and check that the
> actors fail over correctly)
>
> However when I stop the node containing the singleton actors on my
> cluster, the actors do _not_ (ever) fail over to the other node.
>
> The logging looks a bit like this:
>
> ON NODE1, when it starts (note these messages are all x5 times for $a-$f)
>
> [INFO] [11/25/2015 10:13:05.174] [aleph2-akka.actor.default-dispatcher-23]
> [akka.tcp://[email protected]:2252/user/$e] ClusterSingletonManager
> state change [Start -> Younger]
> [INFO] [11/25/2015 10:14:45.511] [aleph2-akka.actor.default-dispatcher-3]
> [akka.tcp://[email protected]:2252/user/$d] Younger observed
> OldestChanged: [Some(akka.tcp://[email protected]:2252) -> myself]
> [INFO] [11/25/2015 10:14:45.515] [aleph2-akka.actor.default-dispatcher-3]
> [akka.tcp://[email protected]:2252/user/$d] ClusterSingletonManager
> state change [Younger -> BecomingOldest]
> [INFO] [11/25/2015 10:14:50.604] [aleph2-akka.actor.default-dispatcher-15]
> [akka.tcp://[email protected]:2252/user/$a] Retry [5], sending
> HandOverToMe to [Some(akka.tcp://[email protected]:2252)]
> [INFO] [11/25/2015 10:14:56.715] [aleph2-akka.actor.default-dispatcher-4]
> [akka.tcp://[email protected]:2252/user/$c] Timeout in BecomingOldest.
> Previous oldest unknown, removed and no TakeOver request.
> [INFO] [11/25/2015 10:14:56.715] [aleph2-akka.actor.default-dispatcher-4]
> [akka.tcp://[email protected]:2252/user/$c] Singleton manager
> [akka.tcp://[email protected]:2252] starting singleton actor
> [INFO] [11/25/2015 10:14:56.717] [aleph2-akka.actor.default-dispatcher-4]
> [akka.tcp://[email protected]:2252/user/$c] ClusterSingletonManager
> state change [BecomingOldest -> Oldest]
>
>
> And my singleton c'tor log message appears, good
>
> On NODE1, when I close it down:
>
> [INFO] [11/25/2015 10:24:35.508] [aleph2-akka.actor.default-dispatcher-18]
> [akka.cluster.Cluster(akka://aleph2)] Cluster Node [akka.tcp://
> [email protected]:2252] - Successfully shut down
> [INFO] [11/25/2015 10:24:35.510] [aleph2-akka.actor.default-dispatcher-16]
> [akka.tcp://[email protected]:2252/user/$b] ClusterSingletonManager
> state change [Oldest -> WasOldest]
> [INFO] [11/25/2015 10:24:35.568] [aleph2-akka.actor.default-dispatcher-23]
> [akka.tcp://[email protected]:2252/user/$e] ClusterSingletonManager
> state change [WasOldest -> HandingOver]
>
>
> Note that I also have a log message in the actors' "postStop" calls, and
> they _don't_ get called.
>
> OK here's the NODE2 logs, where you can clearly see it start to the
> singletons over but then stop:
>
> [INFO] [11/25/2015 10:24:35.517] [aleph2-akka.actor.default-dispatcher-19]
> [akka.tcp://[email protected]:2252/user/$b] Ignoring TakeOver request in
> [Younger] from [akka.tcp://[email protected]:2252].
> [INFO] [11/25/2015 10:24:35.555] [aleph2-akka.actor.default-dispatcher-23]
> [akka.tcp://[email protected]:2252/user/$c] Younger observed
> OldestChanged: [Some(akka.tcp://[email protected]:2252) -> myself]
> [INFO] [11/25/2015 10:24:35.558] [aleph2-akka.actor.default-dispatcher-19]
> [akka.tcp://[email protected]:2252/user/$a] ClusterSingletonManager
> state change [Younger -> BecomingOldest]
> [INFO] [11/25/2015 10:24:35.571] [aleph2-akka.actor.default-dispatcher-19]
> [akka.tcp://[email protected]:2252/user/$e] Hand-over in progress at
> [akka.tcp://[email protected]:2252]
>
> [WARN] [11/25/2015 10:24:36.954]
> [aleph2-akka.remote.default-remote-dispatcher-21] [akka.tcp://
> [email protected]:2252/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Faleph2%4010.1.100.60%3A2252-0]
> Association with remote system [akka.tcp://[email protected]:2252] has
> failed, address is now gated for [5000] ms. Reason: [Disassociated]
> [INFO] [11/25/2015 10:24:37.224] [aleph2-akka.actor.default-dispatcher-20]
> [akka://aleph2/deadLetters] Message
> [akka.cluster.ClusterHeartbeatSender$Heartbeat] from
> Actor[akka://aleph2/system/cluster/core/daemon/heartbeatSender#1281886041]
> to Actor[akka://aleph2/deadLetters] was not delivered. [1] dead letters
> encountered. This logging can be turned off or adjusted with configuration
> settings 'akka.log-dead-letters' and
> 'akka.log-dead-letters-during-shutdown'.
>
> INFO] [11/25/2015 10:24:40.597] [aleph2-akka.actor.default-dispatcher-16]
> [akka.cluster.Cluster(akka://aleph2)] Cluster Node [akka.tcp://
> [email protected]:2252] - Marking exiting node(s) as UNREACHABLE
> [Member(address = akka.tcp://[email protected]:2252, status = Exiting)].
> This is expected and they will be removed.
> [INFO] [11/25/2015 10:24:40.602] [aleph2-akka.actor.default-dispatcher-16]
> [akka.cluster.Cluster(akka://aleph2)] Cluster Node [akka.tcp://
> [email protected]:2252] - Leader is removing exiting node [akka.tcp://
> [email protected]:2252]
>
> [INFO] [11/25/2015 10:24:40.606] [aleph2-akka.actor.default-dispatcher-3]
> [akka.tcp://[email protected]:2252/user/$d] Previous oldest [akka.tcp://
> [email protected]:2252] removed
>
>
>
> It gets Younger->BecomingOldest but never makes it to Oldest. Not sure if
> the WARN/INFO in the middle are relevant, or whether they're part of other
> bits of the cluster
>
> I am running the default config except with he following overrides
>
> .put("akka.actor.provider", "akka.cluster.ClusterActorRefProvider")
> .put("akka.extensions",
> Arrays.asList("akka.cluster.pubsub.DistributedPubSub"))
> .put("akka.remote.netty.tcp.port", port.toString())
> .put("akka.cluster.seed.zookeeper.url",
> _config_bean.zookeeper_connection())
> .put("akka.cluster.auto-down-unreachable-after", "120s")
> .put("akka.cluster.pub-sub.routing-logic", "round-robin")
>
>
>
> Only other relevant thing I can think of is that I have a shutdown hook
> that calls
>
>
> Cluster.get(_akka_system.get()).leave(ZookeeperClusterSeed.get(_akka_system.get()).address());
>
> (and then waits 5s)
>
> Can anyone see anything odd/they recognize?! Many thanks in advance for
> any help
>
> Alex
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] ClusterSingletonManager singleton actors not failing over (2 node cluster)

Reply via email to