Looks like ClusterSharding.start is missing at startup of api-0. Are you
sure that you do that when the ActorSystem has been started and not lazily
on first request?

Enable debug level logging and you should see what it is doing. You should
see log entries in api-0 at startup.

/Patrik
tis 1 aug. 2017 kl. 15:39 skrev Stephen Kennedy <[email protected]>:

> As a follow up, I've now just ruled out any kubernetes networking issue by
> recreating the same behaviour running 3 nodes via docker-compose instead -
> again it only works if an API call hits the first node in seed-nodes list.
>
>
> On Tuesday, 1 August 2017 14:11:07 UTC+1, Stephen Kennedy wrote:
>>
>> Hi,
>>
>> Seeing some strange issues running Cluster Sharded actors in a kubernetes
>> environment.
>>
>> We are currently running a 3 node akka cluster, with our app running
>> within a docker container within a kubernetes stateful set (similar to the
>> akka-seed set described here
>> <https://medium.com/google-cloud/clustering-akka-in-kubernetes-with-statefulset-and-deployment-459c0e05f2ea>).
>> The nodes are all akka-http API servers, which run behind a HTTP load
>> balancer, and use akka-persistence for our domain entities. So we use
>> cluster sharding to ensure that each entity can only live on a single node
>> at once.
>>
>> So the cluster config of our akka nodes looks something like this:
>>
>> akka {
>>   remote {
>>     enabled-transports = ["akka.remote.netty.tcp"]
>>     netty.tcp {
>>       hostname = ${POD_NAME}.api
>>       port = 2551
>>     }
>>   }
>>
>>   cluster {
>>     seed-nodes = [
>>       "akka.tcp://[email protected]:2551",
>>       "akka.tcp://[email protected]:2551",
>>       "akka.tcp://[email protected]:2551"
>>     ]
>>   }
>> }
>>
>>
>> Where "api" is the kubernetes service (which provides DNS mapping) and
>> "api-0/1/2" are the consistent pod names that using a stateful set gives
>> us. Using the default sharding config.
>>
>> And within the code we have a number of calls to ClusterSharding.start -
>> for each type of our sharded entity actors. We then only fire a message to
>> these actors when an appropriate API call comes in.
>>
>> Now when the nodes come up they all consistently connect to the cluster
>> properly, and I see gossip messages suggesting they all know about each
>> other, but we are then seeing communication issues on the Cluster Sharded
>> actors.
>>
>> As far as I can tell, if the first API request for a particular type of
>> entity comes into api-1 or api-2, it often fails because that node is
>> unable to communicate with the coordinator - which it seems to think is on
>> api-0.
>>
>> From api-1:
>> 2017-08-01 12:22:15.253 DEBUG akka.actor.ActorSystemImpl -
>> http://xxxxxxxxxxx/envelopes - HttpMethod(GET) - Starting
>> 2017-08-01 12:23:05.336 WARN  akka.cluster.sharding.ShardRegion - Trying
>> to register to coordinator at
>> [Some(ActorSelection[Anchor(akka.tcp://[email protected]:2551/),
>> Path(/system/sharding/EnvelopeShardCoordinator/singleton/coordinator)])],
>> but no acknowledgement. Total [10] buffered messages.
>> 2017-08-01 12:23:07.336 WARN  akka.cluster.sharding.ShardRegion - Trying
>> to register to coordinator at
>> [Some(ActorSelection[Anchor(akka.tcp://[email protected]:2551/),
>> Path(/system/sharding/EnvelopeShardCoordinator/singleton/coordinator)])],
>> but no acknowledgement. Total [10] buffered messages.
>> 2017-08-01 12:23:09.336 WARN  akka.cluster.sharding.ShardRegion - Trying
>> to register to coordinator at
>> [Some(ActorSelection[Anchor(akka.tcp://[email protected]:2551/),
>> Path(/system/sharding/EnvelopeShardCoordinator/singleton/coordinator)])],
>> but no acknowledgement. Total [10] buffered messages.
>> akka.pattern.AskTimeoutException: Ask timed out on
>> [Actor[akka://actor-system/system/sharding/EnvelopeShard#92546893]] after
>> [15000 ms]. Sender[null] sent message of type
>> "com.goodlord.server.domain.envelope.EnvelopeAggregate$Protocol$Get".
>> 2017-08-01 12:23:10.344 DEBUG akka.actor.ActorSystemImpl -
>> http://xxxxxxxxxxx/envelopes - HttpMethod(GET) - 500 Internal Server
>> Error
>>
>> The trying to register log lines are then repeated forever and if more
>> API calls come into this node (for this shard) the buffered count just goes
>> up. Logging into the
>>
>> However, if we subsequently hit the API and the load balancer chooses
>> api-0, then it works and also seems to initialize the coordinator (which
>> then triggers api-1 to register which clears it's backlog):
>>
>> From api-0:
>> 2017-08-01 12:25:05.902 DEBUG akka.actor.ActorSystemImpl -
>> http://xxxxxxxxxxx/envelopes - HttpMethod(GET) - Starting
>> 2017-08-01 12:25:05.932 INFO  a.c.s.ClusterSingletonManager - Singleton
>> manager starting singleton actor
>> [akka://actor-system/system/sharding/EnvelopeShardCoordinator/singleton]
>> 2017-08-01 12:25:05.933 DEBUG akka.cluster.ddata.Replicator - Received
>> Get for key [EnvelopeShardCoordinatorState]
>> 2017-08-01 12:25:07.338 DEBUG a.c.sharding.DDataShardCoordinator -
>> ShardRegion registered: [Actor[akka.tcp://[email protected]
>> :2551/system/sharding/EnvelopeShard#92546893]]
>> 2017-08-01 12:25:07.339 DEBUG akka.cluster.ddata.Replicator - Received
>> Update for key [EnvelopeShardCoordinatorState]
>> 2017-08-01 12:25:07.341 DEBUG a.c.sharding.DDataShardCoordinator - The
>> coordinator state was successfully updated with
>> ShardRegionRegistered(Actor[akka.tcp://[email protected]
>> :2551/system/sharding/EnvelopeShard#92546893])
>> 2017-08-01 12:25:07.341 DEBUG akka.cluster.ClusterRemoteWatcher -
>> Watching:
>> [akka://actor-system/system/sharding/EnvelopeShardCoordinator/singleton/coordinator
>> -> akka.tcp://[email protected]:2551/system/sharding/EnvelopeShard]
>> 2017-08-01 12:25:07.345 DEBUG akka.cluster.ddata.Replicator - Received
>> Update for key [EnvelopeShardCoordinatorState]
>> 2017-08-01 12:25:07.348 DEBUG a.c.sharding.DDataShardCoordinator - The
>> coordinator state was successfully updated with
>> ShardHomeAllocated(22,Actor[akka.tcp://[email protected]
>> :2551/system/sharding/EnvelopeShard#92546893])
>>
>> From api-1:
>>
>> 2017-08-01 12:25:07.336 WARN  akka.cluster.sharding.ShardRegion - Trying
>> to register to coordinator at
>> [Some(ActorSelection[Anchor(akka.tcp://[email protected]:2551/),
>> Path(/system/sharding/EnvelopeShardCoordinator/singleton/coordinator)])],
>> but no acknowledgement. Total [10] buffered messages.
>> 2017-08-01 12:25:07.342 DEBUG akka.cluster.ClusterRemoteWatcher -
>> Watching: [akka://actor-system/system/sharding/EnvelopeShard ->
>> akka.tcp://[email protected]
>> :2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator]
>> 2017-08-01 12:25:07.342 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [22] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.342 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [13] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.343 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [16] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.343 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [6] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [2] buffered messages.
>> 2017-08-01 12:25:07.343 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [29] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.343 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [20] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.344 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [21] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [2] buffered messages.
>> 2017-08-01 12:25:07.344 WARN  akka.cluster.sharding.ShardRegion - Retry
>> request for shard [10] homes from coordinator at
>> [Actor[akka.tcp://[email protected]:2551/system/sharding/EnvelopeShardCoordinator/singleton/coordinator#1730016074]].
>> [1] buffered messages.
>> 2017-08-01 12:25:07.353 DEBUG akka.cluster.sharding.ShardRegion - Shard
>> [22] located at
>> [Actor[akka://actor-system/system/sharding/EnvelopeShard#92546893]]
>> 2017-08-01 12:25:07.354 DEBUG akka.cluster.sharding.ShardRegion - Shard
>> [13] located at
>> [Actor[akka://actor-system/system/sharding/EnvelopeShard#92546893]]
>> 2017-08-01 12:25:07.362 DEBUG akka.cluster.sharding.ShardRegion - Shard
>> [16] located at
>> [Actor[akka://actor-system/system/sharding/EnvelopeShard#92546893]]
>>
>> So it seems to me that the initial call to ClusterSharding.start doesn't
>> actually do the registration of all shard regions on the cluster and that
>> it is only when we subsequently fire a message to the shard region that the
>> initialization occurs.
>>
>> Is this expected behaviour? Or have we got something wrong with the
>> initialization of our app?
>>
>> Sure enough if I hack together some code to fire a random message to the
>> shard on startup then all our problems go away.
>>
>> Thanks in advance for any help here - don't have a great deal of
>> experience with akka clustering so I'm sure we've done something stupid
>> somewhere.
>>
>> Cheers!
>> Stephen
>>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to