Hi Rick

On Fri, Apr 17, 2015 at 1:01 PM, Rick Latrine <[email protected]>
wrote:

> Yes, it  sounds like the same issue.
>
> The important point is the log message:
> 04:12:24.833 DEBUG Associated [akka.tcp://OurProgrammhostname.com:3552
> <http://[email protected]:3552>] <- [akka.tcp://GatlingSystem@
> gatlinghostname:9044]
>
> The re-connect is going to be associated, but as the internal state of the
> EndpointManager becomes incosistent, an AssertionError will be thrown.
>
> As a workaround I wrote an Auto-Restarter which restarts an ActorSystem in
> case of unexpected shutdown.
> But the Netty-NIO stuff was not shutdown orderly. The TCP-Port was still
> bound and restarts fail.
>
> Finally I investigated a little bit more and found a few things I wouldn't
> do (maybe I have not enough insight):
>

Well, remoting deserves the critique. I wrote most of it and I am happy to
criticize it myself. We will redesign this subsystem soon.


>
> - the death of an EndpointManager should not kill the entire ActorSystem
> (why not restart this subsystem?)
>

This is one of the core subsystems and it is is involved in the startup and
shutdown of the whole system. It is not easy to restart this part also
because most of its state must be preserved. I agree, in the future we
should compartmentalize it more though, but that needs redesign.


> - as theEndpointManager is doing a lot of things, I would move the
> EndpointManager beyond an EndpointSupervisor who ensures aliveness
>
- Resources (like TCP-Channels) should be closed (at least) automatically
> in postStop
>

Yeah, it is handled in the ordinary shutdown, but not in postStop, because
the ordinary shutdown is a two-phase async process. We should really add
extra cleanup to postStop that does not attempt flushing just terminates
everything. Good point.


> - the map to find quarantined UID's should be a 1 [Address] to N list and
> not 1:1 (have a look in akka.remote.EndpointManager.EndpointRegistry)
>

The idea was that on the same address (IP:port) there can be exactly one
UID active and therefore you need to track exactly one quarantined UID. I
still think this is valid assumption, but let's see what debugging will
show.

All in all, this part needs a better design. The original design evolved
too much and finally it deviated from initial assumptions and became too
tangled. Now we know this problem space well enough, and we already have
streams that simplifies many of the issues we faced -- I hope we can start
improving remoting quite soon.

Thank you very much for finding out the root cause, it saves us quite a bit
of time.

-Endre


>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to