Hi Patrik,
Unfortunately I cannot provide you the logs since I don't have them
anymore, however, I can provide you all my akka cluster configs (I am using
different parameters than defaults). Let me also provide more details: we
are trying to form the cluster while the nodes are running with some load
(at least 50% CPU busy). When they try to join the cluster it takes some
time for the seed nodes to respond to the request of joining the cluster.
Some nodes are able to join after few minutes, some are shutting down due
to auto-down timeout and (what I believe) some have the seed-timeout
triggering and they try a different seed node. When that happens, I see two
different scenarios happening:
- All nodes go shutdown and don't join the cluster.
- I have seen a weird scenario where all nodes are members of the same
cluster. Here is where the potential issue could be: some nodes believe the
leader is some node (let's say A) and some other nodes believe that other
node (let's say B) is the leader. A and B are marked as members of the same
cluster. A marks B as unreachable and B marks A as unreachable. This
situation is never ending and they both are logging messages saying the
other is unreachable and the cluster never reaches convergence ("leader can
currently not perform its duties..."). I didn't see any of the leaders
trying to auto-down the other.
I can try to put together some code to reproduce it with the basics of my
project. Would that be good enough?
Thanks for the support!!
Héctor.
El martes, 30 de junio de 2015, 2:23:02 (UTC-5), Patrik Nordwall escribió:
>
> Hi Héctor,
>
> Thanks for reporting. Can you provide full logs from all nodes? Can you
> minimize the problem, perhaps with 2 or 3 nodes?
> We might need logs at DEBUG level, but we could start with looking at INFO
> level.
>
> Regards,
> Patrik
>
> On Mon, Jun 29, 2015 at 9:46 PM, Héctor Veiga <[email protected]
> <javascript:>> wrote:
>
>> Hi,
>>
>> We have a 9 node Akka Cluster and we have noticed some behavior changes
>> since we updated from 2.3.9 to 2.3.11.
>> We are seeing nodes getting stuck while trying to JOIN the cluster:
>>
>> 2015-06-29 19:31:58,394 INFO [Main-akka.actor.default-dispatcher-4] -
>> Cluster Node [akka.tcp://Main@node1:2551] - Node
>> [akka.tcp://Main@node2:2551] is JOINING, roles []
>>
>> It remains in this state until it decides to shutdown since it was not
>> able to join the cluster. My cluster does not start since there is a minium
>> number of members required.
>>
>> Was there anything that substantially changed from 2.3.9 to 2.3.11
>> regarding the joining logic?
>>
>> I tried to look for an open bug/ticket in github but couldn't find
>> anything.
>>
>> Thank you,
>>
>> Héctor Veiga.
>>
>> --
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ:
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
>
> Patrik Nordwall
> Typesafe <http://typesafe.com/> - Reactive apps on the JVM
> Twitter: @patriknw
>
>
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.