Hi,

I have done some deeper digging into that and it seems that the issue is 
not the Terminated message not being received. The sharded actor keeps 
running on both nodes of the cluster! 
The shard region correctly resolves where to send message so it's hard to 
spot this as long as you're using only ShardRegion to communicate with it 
(in that case it will just sit idle on the node that it was supposed to be 
removed from). 
In other case, when you rely on things like DeathWatch for example, it 
won't work as  expected as the actor is still alive.

It looks that the issue lies with the Remember Entities feature that was 
added in 2.4. 
Depending on that setting, either Shard class (persistent-entities=off) or 
PersistentShard (persistent-entities=on)  is used. 

In a ShardRegion class there are 
var shards = Map.empty[ShardId, ActorRef]
var shardsByRef = Map.empty[ActorRef, ShardId]
properties. 
I assume that there are two of them, just to enable faster lookup, 
depending whether you're using ShardId or ActorRef as a key. But I also 
assume that they should be consistent (and they're not). 
What happens in the scenario when the bug manifests (using remember 
entities=on and therefore PersistentShard class) is that shards map 
property is updated only after Persistent Actor recovery. 
case ShardInitialized(shardId)               ⇒ initializeShard(shardId, 
sender())
This introduces a gap where our state is not consistent, shardsRef holds 
some entries that shards does not, and some decisions are made based on 
that inconsistent state. 
For example the sharded actors might not get stopped, when the recovery is 
not finished yet (but the actor is already started so it'll be up 
eventually).
    case msg @ HandOff(shard) ⇒
      log.debug("HandOff shard [{}]", shard)
      if (shardBuffers.contains(shard)) {
        shardBuffers -= shard
        loggedFullBufferWarning = false
      }
      if (shards.contains(shard)) {
        handingOff += shards(shard)
        shards(shard) forward msg
      } else
        sender() ! ShardStopped(shard)



Is there a reason why shards and shardsByRef are not updated atomically? 
For example, the getShard method only updates the shardsByRef property, 
whereas it seems natural to update the shards property in that place as 
well (as opposed to updating shards property only after the recovery is 
completed).

I hope that my post is not too confusing, it's easier to show the problem 
directly in the code rather than posting just some fragments in a post. 

Thanks a lot,
Marcin

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to