Hi Rick On Fri, Apr 17, 2015 at 1:01 PM, Rick Latrine <[email protected]> wrote:
> Yes, it sounds like the same issue. > > The important point is the log message: > 04:12:24.833 DEBUG Associated [akka.tcp://OurProgrammhostname.com:3552 > <http://[email protected]:3552>] <- [akka.tcp://GatlingSystem@ > gatlinghostname:9044] > > The re-connect is going to be associated, but as the internal state of the > EndpointManager becomes incosistent, an AssertionError will be thrown. > > As a workaround I wrote an Auto-Restarter which restarts an ActorSystem in > case of unexpected shutdown. > But the Netty-NIO stuff was not shutdown orderly. The TCP-Port was still > bound and restarts fail. > > Finally I investigated a little bit more and found a few things I wouldn't > do (maybe I have not enough insight): > Well, remoting deserves the critique. I wrote most of it and I am happy to criticize it myself. We will redesign this subsystem soon. > > - the death of an EndpointManager should not kill the entire ActorSystem > (why not restart this subsystem?) > This is one of the core subsystems and it is is involved in the startup and shutdown of the whole system. It is not easy to restart this part also because most of its state must be preserved. I agree, in the future we should compartmentalize it more though, but that needs redesign. > - as theEndpointManager is doing a lot of things, I would move the > EndpointManager beyond an EndpointSupervisor who ensures aliveness > - Resources (like TCP-Channels) should be closed (at least) automatically > in postStop > Yeah, it is handled in the ordinary shutdown, but not in postStop, because the ordinary shutdown is a two-phase async process. We should really add extra cleanup to postStop that does not attempt flushing just terminates everything. Good point. > - the map to find quarantined UID's should be a 1 [Address] to N list and > not 1:1 (have a look in akka.remote.EndpointManager.EndpointRegistry) > The idea was that on the same address (IP:port) there can be exactly one UID active and therefore you need to track exactly one quarantined UID. I still think this is valid assumption, but let's see what debugging will show. All in all, this part needs a better design. The original design evolved too much and finally it deviated from initial assumptions and became too tangled. Now we know this problem space well enough, and we already have streams that simplifies many of the issues we faced -- I hope we can start improving remoting quite soon. Thank you very much for finding out the root cause, it saves us quite a bit of time. -Endre > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
