Hi,

I have a problem where multiple actors for the same entry id are running in 
the cluster. We have two nodes that are both running a ShardRegion and the 
sharded actors. By analyzing logs I've concluded that when a message comes 
in to any of the nodes, the message is routed to an entry actor running on 
the same node. The two ShardRegions does not seem to have consencus about 
where the entry is actually running. Other entries in the same shard seem 
to work fine, I have only identified a single entry that is in this bad 
state. 

I can see that an entry actor is being created on both of the two nodes:
Node 1: 2015-02-03 17:08:35,773+0000 Info 
[reactor-akka.actor.default-dispatcher-3@0xb] 
[com.packagename.service.SessionPollHandler], Instantiated 
SessionPollHandler for session 104
Node 2: 2015-02-03 17:09:08,599+0000 Info 
[reactor-akka.actor.default-dispatcher-2@0xa] 
[com.packagename.service.SessionPollHandler], Instantiated 
SessionPollHandler for session 104

The entries are being sharded on session id. The shard coordinator 
singleton is running on node 2. At 17:09:08 node 2 receives a message which 
is routed to session 104 through the ShardRegion. This triggers the actor 
creation, although the actor is already running in node 1 and should be 
routed there. 

The entry actors are using Akka persistence and extends the 
UntypedPersistentActor. One of the entry actors is not able to write to the 
journal and fails with a message like:
ERROR[db-async-netty-thread-2] MySQLConnection - Received an error message 
-> ErrorMessage(1062,#23000,Duplicate entry 'SessionPollHandler-104-2' for 
key 'PRIMARY')

I assume this is because the two actors think they are on different 
sequence numbers. This means that every other request coming in to the 
system is served correctly and every other message goes to the bad actor 
and fails since the event cannot be persisted.

I can also see some log messages where the shard coordinator fails to write 
to the database:
Persistent snapshot failure: Error 1062 - #23000 - Duplicate entry 
'/user/sharding/SessionPollShardCoordinator/singleton/coordinator' for key 
'PRIMARY'

None of these happen around the time where these the two entry actors are 
created, but some hours before and some hours later. If the poll 
coordinator fails to store it's state I can understand if things break, for 
example so that no actor is created at all, but I wouldn't expect two 
actors to be created. Even though there are a number of these messages the 
system seems to be working fine except for this specific entry that is 
misbehaving.

We are currently running Akka 2.3.6, akka-persistence-sql-async 0.1 and 
mysql-async 0.2.15.

Is there any known issues that could cause something like this, that 
perhaps have been fixed already in later versions of Akka? To me it feels 
like a problem in Akka, or could there be something that we are doing wrong?

I am sure I can resolve the issue for now by modifying the journal database 
and/or restarting the nodes. I have not tried to do that yet since I would 
want to understand what is going on and make sure it does not happen again. 
Any ideas would be appreciated.

Thanks,
Magnus

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to