I am struggling to get akka clustering working with docker in a sane way.

I can manage to get it working if I use --net="host" and use the local 
loopback (127.0.0.1), but if I try and just use the ips for docker 
containers, it seems like Akka remoting falls down somewhere.

Here is my setup:
- running Akka 2.3.14 (including akka cluster, and akka contrib, akka 
persistence)
- running vagrant ubuntu trusty64
- vagrant running docker 1.7.1
- all this on my mac

If I run only a single node, everything works.  If I run multiple nodes on 
127.0.0.1 and --net="host" everything works; otherwise, all bets are off.

I present the clues below as best I can.  Please let me know if there is 
something I am missing.  So far, I nothing has seemed to work.

So, heartbeating appears to work most of the time, seeing this message in 
the log files on the seed node:

00:55:01.583UTC [test] DEBUG akka.cluster.ClusterHeartbeatSender 
akka.tcp://[email protected]:2551/system/cluster/core/daemon/heartbeatSender 
- Cluster Node [akka.tcp://[email protected]:2551] - Heartbeat response 
from [akka.tcp://[email protected]:2552]


And on the cluster node:

00:55:40.845UTC [test] DEBUG akka.cluster.ClusterHeartbeatSender 
akka.tcp://[email protected]:2552/system/cluster/core/daemon/heartbeatSender 
- Cluster Node [akka.tcp://[email protected]:2552] - Heartbeat response 
from [akka.tcp://[email protected]:2551]


Now, every now and then, things get wonky in akka remoting land, this is 
from the node:

00:54:05.843UTC [test] INFO  a.r.transport.ProtocolStateActor 
akka.tcp://[email protected]:2552/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-5
 
- No response from remote. Handshake timed out or transport failure 
detector triggered.

00:54:05.843UTC [test] DEBUG akka.remote.EndpointWriter 
akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0/endpointWriter
 
- Disassociated [akka.tcp://[email protected]:2552] -> 
[akka.tcp://[email protected]:2551]

00:54:05.843UTC [test] WARN  a.remote.ReliableDeliverySupervisor 
akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0
 
- Association with remote system [akka.tcp://[email protected]:2551] has 
failed, address is now gated for [5000] ms. Reason: [Disassociated] 


Now, the big issue appears to be when sending packets, I am seeing this:

00:49:49.415UTC [test] ERROR akka.remote.EndpointWriter 
akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.10%3A2551-0/endpointWriter
 
- AssociationError [akka.tcp://[email protected]:2552] -> 
[akka.tcp://[email protected]:2551]: Error [Message is null] [

akka.actor.InvalidMessageException: Message is null

at akka.dispatch.Envelope$.apply(AbstractDispatcher.scala:27)

at akka.actor.Cell$class.sendMessage(ActorCell.scala:290)

at akka.actor.ActorCell.sendMessage(ActorCell.scala:369)

at akka.actor.LocalActorRef.$bang(ActorRef.scala:384)


I have no idea what that is.  I "believe" this is what is causing my 
functional tests to fail.


Here is my relevant application configuration:

remote {

    log-remote-lifecycle-events = on

    netty.tcp {

      hostname = ${denis.app.host}

      port = ${denis.app.port}

    }

    transport-failure-detector {

      heartbeat-interval = 30 s   # default 4s

      acceptable-heartbeat-pause = 10 s  # default 10s

    }

  }


  cluster {

    seed-nodes = [

      ${denis.app.seed-node}

    ]

    auto-down-unreachable-after = 10s

  }

}


All of the variables actually come from environment variables.  The most 
important one, the host, is passed in when the docker container starts up:


export APP_HOST=`ip addr show eth0 | grep 'inet ' | awk '{print $2}' | cut 
-f1 -d'/'`


Since the docker container is built on centos7, this gets the eth0 ip 
address


FINALLY, starting up the seed node looks like this (by not passing in a 
seed node, the config asserts that the current node is the seed node)

docker run --name seed -p 9000:9000 -p 2551:2551 -d -e "APP_PORT=2551" -e 
"REST_PORT=9000"


I wait until the HTTP port is accessible, then I start the node that wants 
to join...

APP_ADDRESS=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' 
seed)

docker run --name node1 -p 9001:9000 -p 2552:2551 -d -e 
"SEED_NODE=akka.tcp://Denis@${APP_ADDRESS}:2551" -e "APP_PORT=2552" -e 
"REST_PORT=9001"


So, weird right?  The trigger around where my test fails seems to be around 
the message null.  Also, it is weird how it seems like the heartbeats 
periodically stop, but sometimes they seem to work ok.


Here is some more clues if it helps:

********* The node startup: *********

01:07:00.340UTC [test] DEBUG akka.remote.EndpointWriter 
akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.14%3A2551-0/endpointWriter
 
- Associated [akka.tcp://[email protected]:2552] -> 
[akka.tcp://[email protected]:2551]

01:07:00.414UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.InternalClusterAction$InitJoin$]

01:07:00.543UTC [test] DEBUG akka.remote.EndpointWriter 
akka.tcp://[email protected]:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FDenis%40172.17.0.14%3A2551-0/endpointWriter
 
- Drained buffer with maxWriteCount: 50, fullBackoffCount: 1, 
smallBackoffCount: 0, noBackoffCount: 0 , adaptiveBackoff: 1000

01:07:00.683UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.InternalClusterAction$Join]

01:07:01.504UTC [test] INFO  Cluster(akka://Denis) Cluster(akka://Denis) - 
Cluster Node [akka.tcp://[email protected]:2552] - Welcome from 
[akka.tcp://[email protected]:2551]

01:07:01.519UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.GossipEnvelope]

01:07:01.544UTC [test] DEBUG akka.contrib.pattern.ShardRegion 
akka.tcp://[email protected]:2552/user/sharding/Zone - Coordinator moved 
from [] to [akka.tcp://[email protected]:2551]

01:07:01.545UTC [test] DEBUG a.c.pattern.ClusterSingletonProxy 
akka.tcp://[email protected]:2552/user/zoneManagerProxy - Creating 
singleton identification timer...

01:07:01.547UTC [test] DEBUG a.c.pattern.ClusterSingletonProxy 
akka.tcp://[email protected]:2552/user/zoneManagerProxy - Trying to 
identify singleton at 
akka.tcp://[email protected]:2551/user/zoneManagerSingleton/zoneManager

01:07:01.588UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2552] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2551,1649876987)]

01:07:01.848UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.serialization.JavaSerializer] for message 
[akka.contrib.pattern.ShardCoordinator$Internal$Register]

01:07:01.889UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2552] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2551,1649876987)]

01:07:01.903UTC [test] INFO  a.c.pattern.ClusterSingletonManager 
akka.tcp://[email protected]:2552/user/zoneManagerSingleton - 
ClusterSingletonManager state change [Start -> Younger]

01:07:01.913UTC [test] INFO  a.c.pattern.ClusterSingletonManager 
akka.tcp://[email protected]:2552/user/sharding/ZoneCoordinator - 
ClusterSingletonManager state change [Start -> Younger]

01:07:01.938UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.serialization.JavaSerializer] for message 
[akka.actor.Identify]

01:07:02.000UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.MetricsGossipEnvelope]

01:07:02.039UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2552/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2552] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2551,1649876987)]

01:07:02.040UTC [test] INFO  a.c.pattern.ClusterSingletonProxy 
akka.tcp://[email protected]:2552/user/zoneManagerProxy - Singleton 
identified: 
akka.tcp://[email protected]:2551/user/zoneManagerSingleton/zoneManager

01:07:02.068UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.GossipStatus]

01:07:02.104UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.serialization.JavaSerializer] for message 
[akka.dispatch.sysmsg.Watch]


*** WHEN THE SEED FIRST SEES THE NODE CONNECT ***

01:07:01.333UTC [test] DEBUG a.contrib.pattern.ShardCoordinator 
akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator
 
- ShardRegion registered: 
[Actor[akka://Denis/user/sharding/Zone#-1003758096]]

01:07:01.334UTC [test] DEBUG a.contrib.pattern.ShardCoordinator 
akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator
 
- ShardRegion registered: 
[Actor[akka://Denis/user/sharding/Zone#-1003758096]]

01:07:01.337UTC [test] DEBUG a.contrib.pattern.ShardCoordinator 
akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator
 
- ShardRegion registered: 
[Actor[akka://Denis/user/sharding/Zone#-1003758096]]

01:07:01.337UTC [test] DEBUG a.contrib.pattern.ShardCoordinator 
akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator
 
- ShardRegion registered: 
[Actor[akka://Denis/user/sharding/Zone#-1003758096]]

01:07:01.473UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.cluster.protobuf.ClusterMessageSerializer] for message 
[akka.cluster.GossipEnvelope]

01:07:01.478UTC [test] INFO  Cluster(akka://Denis) Cluster(akka://Denis) - 
Cluster Node [akka.tcp://[email protected]:2551] - Leader is moving node 
[akka.tcp://[email protected]:2552] to [Up]

01:07:01.868UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2551] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2552,528567380)]

01:07:01.937UTC [test] DEBUG a.contrib.pattern.ShardCoordinator 
akka.tcp://[email protected]:2551/user/sharding/ZoneCoordinator/singleton/coordinator
 
- ShardRegion registered: 
[Actor[akka.tcp://[email protected]:2552/user/sharding/Zone#186999354]]

01:07:01.968UTC [test] DEBUG a.s.Serialization(akka://Denis) 
akka.serialization.Serialization(akka://Denis) - Using 
serializer[akka.serialization.JavaSerializer] for message 
[akka.actor.ActorIdentity]

01:07:01.982UTC [test] DEBUG s.can.client.HttpHostConnectionSlot 
akka.tcp://[email protected]:2551/user/IO-HTTP/host-connector-0/11 - 
Dispatching POST request to / across connection 
Actor[akka://Denis/user/IO-HTTP/group-0/11#21105308]

01:07:01.996UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2551] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2552,528567380)]

01:07:02.010UTC [test] DEBUG s.can.client.HttpHostConnectionSlot 
akka.tcp://[email protected]:2551/user/IO-HTTP/host-connector-0/11 - 
Delivering 200 OK response for POST request to /

01:07:02.011UTC [test] DEBUG akka.cluster.ClusterCoreDaemon 
akka.tcp://[email protected]:2551/system/cluster/core/daemon - Cluster Node 
[akka.tcp://[email protected]:2551] - Receiving gossip from 
[UniqueAddress(akka.tcp://[email protected]:2552,528567380)]

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to