I am struggling to get akka clustering working with docker in a sane way. I can manage to get it working if I use --net="host" and use the local loopback (127.0.0.1), but if I try and just use the ips for docker containers, it seems like Akka remoting falls down somewhere.
Here is my setup: - running Akka 2.3.14 (including akka cluster, and akka contrib, akka persistence) - running vagrant ubuntu trusty64 - vagrant running docker 1.7.1 - all this on my mac If I run only a single node, everything works. If I run multiple nodes on 127.0.0.1 and --net="host" everything works; otherwise, all bets are off. I present the clues below as best I can. Please let me know if there is something I am missing. So far, I nothing has seemed to work. So, heartbeating appears to work most of the time, seeing this message in the log files on the seed node: 00:55:01.583UTC [test] DEBUG akka.cluster.ClusterHeartbeatSender akka.tcp://[email protected]:2551/system/cluster/core/daemon/heartbeatSender - Cluster Node [akka.tcp://[email protected]:2551] - Heartbeat response from [akka.tcp://[email protected]:2552] And on the cluster node: 00:55:40.845UTC [test] DEBUG akka.cluster.ClusterHeartbeatSender akka.tcp://[email protected]:2552/system/cluster/core/daemon/heartbeatSender - Cluster Node [akka.tcp://[email protected]:2552] - Heartbeat response from [akka.tcp://[email protected]:2551] Now, every now and then, things get wonky in akka remoting land, this is from the node: 00:54:05.843UTC [test] INFO a.r.transport.ProtocolStateActor akka.tcp://[email protected]:2552/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-5 - No response from remote. Handshake timed out or transport failure detector triggered. 00:54:05.843UTC [test] DEBUG akka.remote.EndpointWriter akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0/endpointWriter - Disassociated [akka.tcp://[email protected]:2552] -> [akka.tcp://[email protected]:2551] 00:54:05.843UTC [test] WARN a.remote.ReliableDeliverySupervisor akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0 - Association with remote system [akka.tcp://[email protected]:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated] Now, the big issue appears to be when sending packets, I am seeing this: 00:49:49.415UTC [test] ERROR akka.remote.EndpointWriter akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0/endpointWriter - AssociationError [akka.tcp://[email protected]:2552] -> [akka.tcp://[email protected]:2551]: Error [Message is null] [ akka.actor.InvalidMessageException: Message is null at akka.dispatch.Envelope$.apply(AbstractDispatcher.scala:27) at akka.actor.Cell$class.sendMessage(ActorCell.scala:290) at akka.actor.ActorCell.sendMessage(ActorCell.scala:369) at akka.actor.LocalActorRef.$bang(ActorRef.scala:384) I have no idea what that is. I "believe" this is what is causing my functional tests to fail. Here is my relevant application configuration: remote { log-remote-lifecycle-events = on netty.tcp { hostname = ${denis.app.host} port = ${denis.app.port} } transport-failure-detector { heartbeat-interval = 30 s # default 4s acceptable-heartbeat-pause = 10 s # default 10s } } cluster { seed-nodes = [ ${denis.app.seed-node} ] auto-down-unreachable-after = 10s } } All of the variables actually come from environment variables. The most important one, the host, is passed in when the docker container starts up: export APP_HOST=`ip addr show eth0 | grep 'inet ' | awk '{print $2}' | cut -f1 -d'/'` Since the docker container is built on centos7, this gets the eth0 ip address FINALLY, starting up the seed node looks like this (by not passing in a seed node, the config asserts that the current node is the seed node) docker run --name seed -p 9000:9000 -p 2551:2551 -d -e "APP_PORT=2551" -e "REST_PORT=9000" I wait until the HTTP port is accessible, then I start the node that wants to join... APP_ADDRESS=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' seed) docker run --name node1 -p 9001:9000 -p 2552:2551 -d -e "SEED_NODE=akka.tcp://Denis@${APP_ADDRESS}:2551" -e "APP_PORT=2552" -e "REST_PORT=9001" So, weird right? The trigger around where my test fails seems to be around the message null. Also, it is weird how it seems like the heartbeats periodically stop, but sometimes they seem to work ok. Here is some more clues if it helps: ********* The node startup: ********* 01:07:00.340UTC [test] DEBUG akka.remote.EndpointWriter akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.14%3A2551-0/endpointWriter - Associated [akka.tcp://[email protected]:2552] -> [akka.tcp://[email protected]:2551] 01:07:00.414UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.InternalClusterAction$InitJoin$] 01:07:00.543UTC [test] DEBUG akka.remote.EndpointWriter akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.14%3A2551-0/endpointWriter - Drained buffer with maxWriteCount: 50, fullBackoffCount: 1, smallBackoffCount: 0, noBackoffCount: 0 , adaptiveBackoff: 1000 01:07:00.683UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.InternalClusterAction$Join] 01:07:01.504UTC [test] INFO Cluster(akka://Denis) Cluster(akka://Denis) - Cluster Node [akka.tcp://[email protected]:2552] - Welcome from [akka.tcp://[email protected]:2551] 01:07:01.519UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.GossipEnvelope] 01:07:01.544UTC [test] DEBUG akka.contrib.pattern.ShardRegion akka.tcp://[email protected]:2552/user/sharding/Zone - Coordinator moved from [] to [akka.tcp://[email protected]:2551] 01:07:01.545UTC [test] DEBUG a.c.pattern.ClusterSingletonProxy akka.tcp://[email protected]:2552/user/zoneManagerProxy - Creating singleton identification timer... 01:07:01.547UTC [test] DEBUG a.c.pattern.ClusterSingletonProxy akka.tcp://[email protected]:2552/user/zoneManagerProxy - Trying to identify singleton at akka.tcp://[email protected]:2551/user/zoneManagerSingleton/zoneManager 01:07:01.588UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2552] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2551,1649876987)] 01:07:01.848UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.serialization.JavaSerializer] for message [akka.contrib.pattern.ShardCoordinator$Internal$Register] 01:07:01.889UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2552] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2551,1649876987)] 01:07:01.903UTC [test] INFO a.c.pattern.ClusterSingletonManager akka.tcp://[email protected]:2552/user/zoneManagerSingleton - ClusterSingletonManager state change [Start -> Younger] 01:07:01.913UTC [test] INFO a.c.pattern.ClusterSingletonManager akka.tcp://[email protected]:2552/user/sharding/ZoneCoordinator - ClusterSingletonManager state change [Start -> Younger] 01:07:01.938UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.serialization.JavaSerializer] for message [akka.actor.Identify] 01:07:02.000UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.MetricsGossipEnvelope] 01:07:02.039UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2552] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2551,1649876987)] 01:07:02.040UTC [test] INFO a.c.pattern.ClusterSingletonProxy akka.tcp://[email protected]:2552/user/zoneManagerProxy - Singleton identified: akka.tcp://[email protected]:2551/user/zoneManagerSingleton/zoneManager 01:07:02.068UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.GossipStatus] 01:07:02.104UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.serialization.JavaSerializer] for message [akka.dispatch.sysmsg.Watch] *** WHEN THE SEED FIRST SEES THE NODE CONNECT *** 01:07:01.333UTC [test] DEBUG a.contrib.pattern.ShardCoordinator akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator - ShardRegion registered: [Actor[akka://Denis/user/sharding/Zone#-1003758096]] 01:07:01.334UTC [test] DEBUG a.contrib.pattern.ShardCoordinator akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator - ShardRegion registered: [Actor[akka://Denis/user/sharding/Zone#-1003758096]] 01:07:01.337UTC [test] DEBUG a.contrib.pattern.ShardCoordinator akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator - ShardRegion registered: [Actor[akka://Denis/user/sharding/Zone#-1003758096]] 01:07:01.337UTC [test] DEBUG a.contrib.pattern.ShardCoordinator akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator - ShardRegion registered: [Actor[akka://Denis/user/sharding/Zone#-1003758096]] 01:07:01.473UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.GossipEnvelope] 01:07:01.478UTC [test] INFO Cluster(akka://Denis) Cluster(akka://Denis) - Cluster Node [akka.tcp://[email protected]:2551] - Leader is moving node [akka.tcp://[email protected]:2552] to [Up] 01:07:01.868UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2551] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2552,528567380)] 01:07:01.937UTC [test] DEBUG a.contrib.pattern.ShardCoordinator akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator - ShardRegion registered: [Actor[akka.tcp://[email protected]:2552/user/sharding/Zone#186999354]] 01:07:01.968UTC [test] DEBUG a.s.Serialization(akka://Denis) akka.serialization.Serialization(akka://Denis) - Using serializer[akka.serialization.JavaSerializer] for message [akka.actor.ActorIdentity] 01:07:01.982UTC [test] DEBUG s.can.client.HttpHostConnectionSlot akka.tcp://[email protected]:2551/user/IO-HTTP/host-connector-0/11 - Dispatching POST request to / across connection Actor[akka://Denis/user/IO-HTTP/group-0/11#21105308] 01:07:01.996UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2551] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2552,528567380)] 01:07:02.010UTC [test] DEBUG s.can.client.HttpHostConnectionSlot akka.tcp://[email protected]:2551/user/IO-HTTP/host-connector-0/11 - Delivering 200 OK response for POST request to / 01:07:02.011UTC [test] DEBUG akka.cluster.ClusterCoreDaemon akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node [akka.tcp://[email protected]:2551] - Receiving gossip from [UniqueAddress(akka.tcp://[email protected]:2552,528567380)] -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
