Re: [akka-user] Re: Seed nodes behavior in 3 node scenario in version 2.5.0

Patrik Nordwall Tue, 05 Dec 2017 23:33:07 -0800

ok, also make sure you use the latest patch version. 2.5.0 is old. 2.5.8
should be available today or in a few days.


On Wed, Dec 6, 2017 at 6:06 AM, Muthukumaran Kothandaraman <
[email protected]> wrote:

> Thanks Patrik.
>
> In fact, I use Cluster JMX + Jolokia to see how the status and unreachable
> node list changes and its perfectly in alignment with documented behavior.
> In fact, I am planning to use the JMX client to monitor formation of
> minority partitions to raise alarms as at now. Later we may consider using
> the partition info to take actions like isolation of nodes
>
> We are using Opendaylight RAFT consensus cluster built atop Akka.
>
> Regards
> Muthu
>
>
> On Tuesday, 5 December 2017 22:35:05 UTC+5:30, Patrik Nordwall wrote:
>>
>>
>>
>> On Mon, Dec 4, 2017 at 5:32 PM, Muthukumaran Kothandaraman <
>> [email protected]> wrote:
>>
>>> Further observations
>>>
>>> I started suspecting that in the double fault scenario as below,
>>> auto-down-unreachable-after can play a significant role. As suspected,
>>> for TEST SCENARIO 3 below, when auto-down-unreachable-after was enabled
>>> with 10s delay when nodes moved from unreachable 'condition' (I
>>> understand that unreachable mainly a flag in cluster-status and not a state
>>> by itself) to Removed State ,TEST SCENARIO 3 below was able to allow the
>>> 3rd node join after all nodes 'saw' 3rd node as Down.
>>>
>>
>> I think you see in the logs for scenario 3 that the 3rd node is able to
>> join (log in 2nd node) and that it receives the Welcome message (log in 3rd
>> node). It is not moved to the Up state because there are still unreachable
>> nodes in the cluster, i.e. 1st node is still unreachable and not removed.
>> That is why you see that it works when you enable the auto-down.
>>
>> If you would also start 1st node and it joins again then it will also
>> work without auto-down, because when a node joins with same hostname and
>> port (different uid) that is enough evidence to safely remove the old
>> incarnation.
>>
>>
>>>
>>> I very well understand the consequence of enabling 
>>> auto-down-unreachable-after
>>> config parameter. That can lead to an array of issues (including but not
>>> limited to)
>>> false-positives in case of network partitions
>>>
>>> I request the clustering experts to comment if there are any
>>> alternatives other than enabling auto-down-unreachable-after.
>>>
>>
>> You'll probably find https://github.com/akka/akka-management useful when
>> you test those things. Then you can see current membership state and also
>> perform the downing of the 1st node manually. In the end you need some kind
>> of downing strategy, and as you know Lightbend has one as commercial
>> offering.
>>
>> Regards,
>> Patrik
>>
>>
>>>
>>> Thanks in advance
>>>
>>> Regards
>>> Muthu
>>>
>>>
>>>
>>>
>>>
>>> On Monday, 4 December 2017 14:34:15 UTC+5:30, Muthukumaran Kothandaraman
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to validate below statements in Akka Docs with slight twist
>>>> in the scenario. After first seed node is brought down and before
>>>> restarting the first seed-node, I try to restart another surviving node of
>>>> cluster
>>>>
>>>> "*When a new node is started it sends a message to all seed nodes and
>>>> then sends join command to the one that answers first.*"
>>>>
>>>> In conjunction with
>>>>
>>>> "*Once more than two seed nodes have been started it is no problem to
>>>> shut down the first seed node. If the first seed node is restarted, it will
>>>> first try to join the other seed nodes in the existing cluster.* "
>>>>
>>>> My configuration for 3 nodes (ie. 3 JVMs running on same machine) is as
>>>> following in same order :
>>>>
>>>>  cluster {
>>>>
>>>>    // URL format is 'akka://' for artery. When netty is to be enabled,
>>>>    // change this to 'akka.tcp://'
>>>>    seed-nodes = [
>>>>      "akka.tcp://[email protected]:25510",   // 1st Seed Node
>>>>      "akka.tcp://[email protected]:25520",   // 2nd Seed Node
>>>>      "akka.tcp://[email protected]:25530"    // 3rd Seed Node
>>>>    ]
>>>>
>>>>    seed-node-timeout = 12s
>>>>
>>>>    #auto-down-unreachable-after = 10s
>>>>
>>>>    #allow-weakly-up-members = on
>>>>  }
>>>>
>>>>
>>>>
>>>> *TEST SCENARIO 1 (PASSED)* : Bring up nodes in any order and form the 
>>>> cluster
>>>>
>>>>
>>>> *TEST SCENARIO 2 (PASSED)* : After cluster is formed, bring down the first 
>>>> seed node (ie. akka.tcp://[email protected]:25510).
>>>>
>>>>                   Ensure remaining cluster works without any disruption
>>>>
>>>>
>>>> *TEST SCENARIO 3 (NOT PASSING**)* : Bring down 3rd node 
>>>> (akka.tcp://[email protected]:25530) when 1st seed node is still 
>>>> DOWN. And then bring up 3rd node.
>>>>
>>>>                                     As I understand from first referred 
>>>> statement from documentation, 3rd seed node must be able to join 2nd 
>>>> seed-node even when 1st node is down
>>>>
>>>>                                     But the observation is that 3rd seed 
>>>> node does NOT join the cluster.
>>>>
>>>>
>>>> Of course, only when 2nd node is surviving and I bring up 1st as well as 
>>>> 3rd node, cluster forms correctly again
>>>>
>>>>
>>>>
>>>> *Clarification :*
>>>>
>>>>
>>>> 1) Documentation mentions "once more than 2 seed nodes have been started". 
>>>> Does that mean that my TEST SCENARIO 3 above is invalid ? In other words, 
>>>> at any point in time,
>>>>
>>>>   I should have at least 2 seed nodes of cluster to be alive make the TEST 
>>>> SCENARIO 3 above pass ?
>>>>
>>>>
>>>> 2) And, if there are double faults like TEST SCENARIO 3, will cluster not 
>>>> converge till we perform full cluster reboot or bring back the first node 
>>>> in seed node list ?
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Muthu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>> urrent/additional/faq.html
>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>> p/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>>
>> Patrik Nordwall
>> Akka Tech Lead
>> Lightbend <http://www.lightbend.com/> -  Reactive apps on the JVM
>> Twitter: @patriknw
>>
>> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Akka Tech Lead
Lightbend <http://www.lightbend.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Re: Seed nodes behavior in 3 node scenario in version 2.5.0

Reply via email to