A little more investigation seems to yield that we were wrong about the
UIDs being the issue.

We get the following log:
12:45:57.499 [clusterTestActorSystem-akka.actor.default-dispatcher-5] INFO
 a.c.Cluster(akka://clusterTestActorSystem) - Cluster Node [akka.ssl.tcp://
[email protected]:2551] - New incarnation of existing member
[Member(address = akka.ssl.tcp://[email protected]:2553,
status = Up)] is trying to join. Existing will be removed from the cluster
and then new member will be allowed to join.

12:45:57.500 [clusterTestActorSystem-akka.actor.default-dispatcher-5] INFO
 a.c.Cluster(akka://clusterTestActorSystem) - Cluster Node [akka.ssl.tcp://
[email protected]:2551] - Marking unreachable node
[akka.ssl.tcp://[email protected]:2553] as [Down]

12:45:58.178 [clusterTestActorSystem-akka.actor.default-dispatcher-17] INFO
 a.c.Cluster(akka://clusterTestActorSystem) - Cluster Node [akka.ssl.tcp://
[email protected]:2551] - Leader is removing unreachable
node [akka.ssl.tcp://[email protected]:2553]

12:45:58.187 [clusterTestActorSystem-akka.actor.default-dispatcher-19] WARN
 akka.remote.Remoting - Association to [akka.ssl.tcp://
[email protected]:2553] having UID [-1829330708] is
irrecoverably failed. UID is now quarantined and all messages to this UID
will be delivered to dead letters. Remote actorsystem must be restarted to
recover from this situation.

Which I interpret as:

   - Because we turned off auto-downing of nodes, despite nodes being
   unreachable/terminated - they remain in the cluster list.
   - When we re-create a new incarnation of the node the cluster realises
   its a new incarnation (Due to the unique UID).
      - The cluster removes the old incarnation.
      - And then immediately quarantines the new incarnation. (This is an
      assumption - I can't tell what the UID is of the old or the new one - and
      have assumed the quarantined instance to be the new one since it
now fails
      to ever get moved to UP).


There are some obvious solutions that we will carry out - for instance we
should manually down the nodes. But it does seem peculiar that when
removing an old incarnation of a node it 'seemingly' quarantines the new
incarnation.

Thanks kindly,
Daniel Stoner

On 25 February 2016 at 10:57, Daniel Stoner <[email protected]> wrote:

> Hi,
>
> Recently we've been setting up some testing of our application when
> running as a Cluster. We start 1 actor system on port 2551 as part of our
> test suite.
>
> As part of this individual test we then start further servers on port 2552
> and 2553.
>
> This works great - we have a counting actor that shows the cluster has
> received MemberUp for all 3 nodes and our test succeeds.
>
> We thought we'd take it to the next level - and use the IntelliJ ide's
> feature to run our test suite 100 times to check this wasn't a fluke and
> when we did so we spotted some peculiar behaviour.
>
> For context - Actor System on port 2551 never gets stopped but the actor
> systems on port 2552/2553 get started during the individual test and
> stopped at its end. These are always brand new instances of
> ActorSystem.create(....), we are not simply stopping/starting these servers.
>
> After about 30 runs of this test, during shutdown of 2552/2553 its very
> likely they both become quarantined by 2551. (Not a surprise).
> What is a surprise - is that when we re-create a brand new ActorSystem on
> 2552/2553 it is seen as being the same original server (hostname,port,uid)
> - and quarantine behaviour kicks in (IE No-one will talk to it and the test
> fails all further runs).
>
> From this piece of documentation:
> http://doc.akka.io/docs/akka/2.4.2/common/cluster.html
> "The identifier for each node is a hostname:port:uid tuple"
>
> So obviously the hostname and port remain the same when we shutdown and
> restart - but how is the 'uid' generated?
> Is this something based on the JVM/OS Thread things are being created in -
> or is this user configurable - since it seems our problem is that when
> creating new actor systems we are getting uid's which basically aren't very
> unique.
>
> Firstly - is there a way we can see the ActorSystems uid to confirm this
> is the case, and finally is there some way we can specify the uid used (to
> enforce uniqueness)?
>
> Thanks kindly,
> Daniel Stoner
>
> --
> Daniel Stoner | Senior Software Engineer UtopiaIT | Ocado Technology
> [email protected] | Ext 7969 | www.ocadotechnology.com
>
>


-- 
Daniel Stoner | Senior Software Engineer UtopiaIT | Ocado Technology
[email protected] | Ext 7969 | www.ocadotechnology.com

-- 


Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group. 

 

If you are not the intended recipient, please notify us immediately and 
delete all copies of this message. Please note that it is your 
responsibility to scan this message for viruses. 

 

Fetch and Sizzle are trading names of Speciality Stores Limited, a member 
of the Ocado Group.

 

References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, 
Hatfield Business Park, Hatfield, Herts. AL10 9NE.

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to