Using akka cluster for this type of communication is a bad idea. Akka clustering over an unstable network/or unstable devices will result in network issues. When akka clustering have network issues over a little while it will eventually makre the node as unreachable, and then as down.
When the node has reached down state, it will be refused when its trying to connect to the cluster. You then have to manually remove the node, before it can reconnect. This is the only safe way of doing thins with akka cluster, you can use autodowning, maby that works for you. You then run the risk of haveing two sperate clusters. I would look at other technologies for this problem, we are looking at removing akka and maby use eventstore for our synchronization needs, as eventstore is quorum based. On Thu, May 28, 2015 at 12:17 PM <[email protected]> wrote: > Hi Again, > > I noticed a few more things: > > First of all, I ran the same test (master with s single worker) with a > physical Android device (instead of an emulator), and the cluster > disconnection was much less frequent (took more than two hours), but with > the same symptoms. > I therefore suspect that the problem may be related to an arbitrary > communication failure with the device - that happens less frequently with > physical devices (as they are more stable). > > Secondly, looking at the Akka logs from the master side, after the > disassociation event occurs, I start getting dead-letters messages: > [INFO] [05/28/2015 09:59:50.715] > [ClusterSystem-akka.actor.default-dispatcher-25] > [akka://ClusterSystem/deadLetters] Message [akka.cluster.GossipStatus] from > Actor[akka://ClusterSystem/system/cluster/core/daemon#-1559364220] to > Actor[akka://ClusterSystem/deadLetters] was not delivered. [3] dead letters > encountered. This logging can be turned off or adjusted with configuration > settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > [INFO] [05/28/2015 09:59:50.731] > [ClusterSystem-akka.actor.default-dispatcher-19] > [akka://ClusterSystem/deadLetters] Message > [akka.contrib.pattern.DistributedPubSubMediator$Internal$Status] from > Actor[akka://ClusterSystem/user/distributedPubSubMediator#-825410933] to > Actor[akka://ClusterSystem/deadLetters] was not delivered. [4] dead letters > encountered. This logging can be turned off or adjusted with configuration > settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > > After a few seconds, I notice that the connection to the worker was > refused: > [WARN] [05/28/2015 09:59:54.746] > [ClusterSystem-akka.remote.default-remote-dispatcher-26] [akka.tcp:// > [email protected]:2551/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2553-0] > Association with remote system [akka.tcp://[email protected]:2553] > has failed, address is now gated for [5000] ms. Reason: [Association failed > with [akka.tcp://[email protected]:2553]] Caused by: [Connection > refused: /127.0.0.1:2553] > > Afterwards I see two more heartbeats sent to the worker (with no > response), before it is marked unreachable (The IP of the other node is > localhost as it runs on a local Android emulator) > > My questions are: > 1. Is it possible that the other node actively refuses the tcp connection? > If so, why, and how can I avoid it? > 2. The exception that I brought in the first post of this thread, the > cluster can generally recover from it, right? If so, what stops the cluster > from doing so? > > Thank, > Nozik > > > > On Tuesday, May 26, 2015 at 4:55:39 PM UTC+3, Ran Nozik wrote: >> >> Hi Endre, >> >> Thank you for your quick response. >> >> I verified that the only protobuf version we use >> is com.google.protobuf:protobuf-java:2.5.0 (no other versions in the >> classpath). >> >> I'm not sure I understood your question about the remoting. We have a >> distributed system with many (backend) Android workers and one master >> (frontend) node. They do not interact as client and server. >> >> Regards, >> Nozik >> >> On Tue, May 26, 2015 at 4:21 PM, Endre Varga <[email protected]> >> wrote: >> >>> Caused by: com.google.protobuf.UninitializedMessageException: Message >>> missing required fields: >>> ... 30 more >>> ] >>> >>> This very much looks like a serialization problem though. Do you maybe >>> have a newer protobuf version on your classpath than the one Akka uses? >>> >>> Btw, why are you using akka-remoting between android systems? Don't >>> forget that remoting and clustering are not client-server technologies but >>> peer-to-peer technologies: >>> http://doc.akka.io/docs/akka/2.3.11/general/remoting.html#Peer-to-Peer_vs__Client-Server >>> >>> -Endre >>> >>> On Tue, May 26, 2015 at 3:16 PM, <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I upgraded to 2.3.11 and the problem reproduced again. >>>> >>>> Thanks. >>>> >>>> >>>> On Tuesday, May 26, 2015 at 3:12:38 PM UTC+3, √ wrote: >>>>> >>>>> Hi Mozik, >>>>> >>>>> please upgrade to the latest version and report back if you still have >>>>> the same problem. >>>>> >>>>> On Tue, May 26, 2015 at 2:03 PM, <[email protected]> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> I've been trying to set an Akka cluster with one master node and >>>>>> multiple workers. The workers are actor systems than run on Android >>>>>> emulators. >>>>>> As a start, I work with one worker (emulator). I verify that it >>>>>> successfully joins the cluster and start sending it messages, that are >>>>>> handled successfully. After some time (from 2-3 to 30-40 minutes), >>>>>> however, >>>>>> it disconnects from the cluster. >>>>>> Trying to figure out what causes the problem, I noticed that even if >>>>>> the worker is idle (no messages are sent), it disconnects from the >>>>>> cluster >>>>>> after some time. >>>>>> >>>>>> In the Android logcat, the following message is displayed: >>>>>> >>>>>> [ClusterSystem-akka.remote.default-remote-dispatcher-5] [akka.tcp:// >>>>>>> [email protected]:2553/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%4010.141.4.104%3A2551-0] >>>>>>> Association with remote system [akka.tcp:// >>>>>>> [email protected]:2551] has failed, address is now gated >>>>>>> for [5000] ms. Reason is: []. >>>>>> >>>>>> >>>>>> and then: >>>>>> >>>>>> [ClusterSystem-cluster-dispatcher-15] [akka.tcp:// >>>>>> [email protected]:2553/system/cluster/core/daemon] Cluster >>>>>> Node [akka.tcp://[email protected]:2553] - Marking node(s) as >>>>>> UNREACHABLE [Member(address = akka.tcp:// >>>>>> [email protected]:2551, status = Up)] >>>>>> >>>>>> and eventually: >>>>>> >>>>>> [ClusterSystem-cluster-dispatcher-26] [Cluster(akka://ClusterSystem)] >>>>>> Cluster Node [akka.tcp://[email protected]:2553] - Leader is >>>>>> auto-downing unreachable node [akka.tcp:// >>>>>> [email protected]:2551] >>>>>> [ClusterSystem-cluster-dispatcher-26] [Cluster(akka://ClusterSystem)] >>>>>> Cluster Node [akka.tcp://[email protected]:2553] - Marking >>>>>> unreachable node [akka.tcp://[email protected]:2551] as >>>>>> [Down] >>>>>> [ClusterSystem-cluster-dispatcher-27] [Cluster(akka://ClusterSystem)] >>>>>> Cluster Node [akka.tcp://[email protected]:2553] - Leader is >>>>>> removing unreachable node [akka.tcp://[email protected]:2551 >>>>>> ] >>>>>> >>>>>> >>>>>> After I subscribed to AssociationErrorEvent, I was able to get more >>>>>> details: >>>>>> >>>>>> AssociationErrorEvent has occurred: AssociationError [akka.tcp:// >>>>>>> [email protected]:2553] -> [akka.tcp:// >>>>>>> [email protected]:2551]: Error [] [ >>>>>>> akka.remote.EndpointException: >>>>>>> at >>>>>>> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) >>>>>>> at >>>>>>> akka.remote.ContainerFormats$Selection$Builder.build(ContainerFormats.java:1513) >>>>>>> at >>>>>>> akka.remote.ContainerFormats$SelectionEnvelope$Builder.addPattern(ContainerFormats.java:931) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer$$anonfun$serializeSelection$1.apply(MessageContainerSerializer.scala:45) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer$$anonfun$serializeSelection$1.apply(MessageContainerSerializer.scala:43) >>>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>>>> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >>>>>>> at scala.collection.AbstractIterable.foreach(Iterable.scala:54) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer.serializeSelection(MessageContainerSerializer.scala:43) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer.toBinary(MessageContainerSerializer.scala:25) >>>>>>> at >>>>>>> akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36) >>>>>>> at >>>>>>> akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:842) >>>>>>> at >>>>>>> akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:842) >>>>>>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) >>>>>>> at akka.remote.EndpointWriter.serializeMessage(Endpoint.scala:841) >>>>>>> at akka.remote.EndpointWriter.writeSend(Endpoint.scala:742) >>>>>>> at >>>>>>> akka.remote.EndpointWriter$$anonfun$2.applyOrElse(Endpoint.scala:717) >>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>>>> at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:410) >>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>>>> at >>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>>> at >>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >>>>>>> at >>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >>>>>>> at >>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>>> at >>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>>> Caused by: com.google.protobuf.UninitializedMessageException: >>>>>>> Message missing required fields: >>>>>>> ... 30 more >>>>>>> ] >>>>>>> akka.remote.EndpointException: >>>>>>> at >>>>>>> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) >>>>>>> at >>>>>>> akka.remote.ContainerFormats$Selection$Builder.build(ContainerFormats.java:1513) >>>>>>> at >>>>>>> akka.remote.ContainerFormats$SelectionEnvelope$Builder.addPattern(ContainerFormats.java:931) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer$$anonfun$serializeSelection$1.apply(MessageContainerSerializer.scala:45) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer$$anonfun$serializeSelection$1.apply(MessageContainerSerializer.scala:43) >>>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>>>> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >>>>>>> at scala.collection.AbstractIterable.foreach(Iterable.scala:54) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer.serializeSelection(MessageContainerSerializer.scala:43) >>>>>>> at >>>>>>> akka.remote.serialization.MessageContainerSerializer.toBinary(MessageContainerSerializer.scala:25) >>>>>>> at >>>>>>> akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36) >>>>>>> at >>>>>>> akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:842) >>>>>> >>>>>> >>>>>> >>>>>> At first I though that there's a serialization problem with one of >>>>>> the messages that are sent to or from the worker. However, the problem >>>>>> repeats itself even when there are no messages sent to the worker at all. >>>>>> If I restart the worker, it re-joins the cluster and everything is >>>>>> back to normal again (until the next disconnection event) - so the >>>>>> problem >>>>>> isn't permanent. >>>>>> >>>>>> I'm using Akka 2.3.9 on both master and worker. >>>>>> >>>>>> What could be causing the problem? Could it be Android related? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> -- >>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>> >>>>>>>>>> Check the FAQ: >>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>> >>>>>>>>>> Search the archives: >>>>>> https://groups.google.com/group/akka-user >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Akka User List" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Cheers, >>>>> √ >>>>> >>>> -- >>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>> >>>>>>>>>> Check the FAQ: >>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>> >>>>>>>>>> Search the archives: >>>> https://groups.google.com/group/akka-user >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Akka User List" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/akka-user. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>> >>>>>>>>>> Check the FAQ: >>> http://doc.akka.io/docs/akka/current/additional/faq.html >>> >>>>>>>>>> Search the archives: >>> https://groups.google.com/group/akka-user >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "Akka User List" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/akka-user/EfTyabqQyK8/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> >>> Visit this group at http://groups.google.com/group/akka-user. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
