Hi Robert,

On Wed, Nov 5, 2014 at 9:55 PM, Robert Preissl <[email protected]>
wrote:

> hello!
>
> I am having a problem in my remote Akka production system, which consists
> of 3 nodes running with the latest version of Akka (2.3.6.):
>
> In more details, I am experiencing errors with "*rolling restarts*" of
> the cluster (for deployment, etc.  we cannot afford any downtime), where a
> restart happens in the following sequence
> 1.) restart node1 and node2.
> 2.) once 1. completed, restart node3.
>
> *but we only observe failures once there is a load (even small load) on
> the system*. So, I want to describe two scenarios:
>
>
> *Scenario 1 - no load on the system: Restart works.*
>
> if there is no load on the system at all, the restarting seems to work
> fine. I.e., with detailed logging I can observe that node3 logs the
> following events: (in chronological order)
>
> 13:09:48.769 WARN  
> [akka.tcp://DivaPCluster@NODE_3:8900/system/endpointManager/reliableEndpointWriter-akka.tcp0-1]
> akka.remote.ReliableDeliverySupervisor - Association with remote system
> [akka.tcp://DivaPCluster@NODE_2:8900] has failed, address is now gated
> for [5000] ms. Reason is: [Disassociated].
> 13:09:48.823 WARN  
> [akka.tcp://DivaPCluster@NODE_3:8900/system/endpointManager/reliableEndpointWriter-akka.tcp0-0]
> akka.remote.ReliableDeliverySupervisor - Association with remote system
> [akka.tcp://DivaPCluster@NODE_1:8900] has failed, address is now gated
> for [5000] ms. Reason is: [Disassociated].
>
> 13:10:10.661 DEBUG [Remoting] Remoting - Associated
> [akka.tcp://DivaPCluster@NODE_3:8900] <- [akka.tcp://DivaPCluster@NODE_2
> :8900]
> 13:10:10.987 DEBUG [Remoting] Remoting - Associated
> [akka.tcp://DivaPCluster@NODE_3:8900] <- [akka.tcp://DivaPCluster@NODE_1
> :8900]
>
> Since node1 and node2 restart, it is fine that the association is gated
> between node3 -> node1 (and between node3 -> node2) for a while.
> And I assume it becomes active again since "a successful inbound
> connection is accepted from a remote system during Gate it automatically
> transitions to Active" (as you describe in
> http://doc.akka.io/docs/akka/snapshot/java/remoting.html)
>
> this can be verified since I can see the logs on node1 that it tries to
> connect at this point in time after the restart: 13:10:10.861 (and the
> connection becomes active on node3; managing node3 -> node1; at time
> 13:10:10.987 as you can see above)
>
> so, everything cool here and the system restarts fine!
>
>
>
> *Scenario 2 - easy load on the system: Restart fails due to Unrecoverable
> "gated" state*
>
> Similar to Scenario 1 above, I can observe the "gated" messages for links
> node3 -> node1 and node3 -> node2.
>
> However, I never see that the links become active again! and the restart
> never recovers and I need to manually stop my nodes and start up again.
>
> This is surprising since I clearly see that node1 and node2 (after they
> restarted) send message to node3. and node3 successfully logs the reception
> of these messages.
>
> So, why does in this scenario the connection not become active again?? It
> is a successful inbound connection that should make the link active again
> as you describe on your site?
>


If node3 receives messages, then that link is active -- what do you mean by
"does not become active again"? Do you lose messages from node3? This is
not clear from your explanation.

We need logs about this, otherwise we cannot see what can be the problem.
You can set the following settings:

akka.remote.log-received-messages = on
akka.remote.log-sent-messages = on

so that all send attempts and received messages are logged at DEBUG level.


>
>
> Any help on this is greatly appreciated. otherwise we need to roll back to
> Scala 2.10 (or 2.9) and an older version of Akka.
>

If this is an Akka bug it will be very likely fixed in 2.3.7. We might need
your help though to be able to reproduce and debug it.

-Endre


> Thanks,
> Robert
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to