On Fri, 2020-06-26 at 12:09 -0700, Ryan Schmitt wrote:
> Is there a specific type of corruption or inconsistency that the IO
> Reactor
> is prone to? My concern is that if the IO Reactor can die under _any_
> circumstance, we have to code for that eventuality under _all_
> circumstances, on pain of a serious production outage. To my
> knowledge,
> other async HTTP clients (e.g. those based on Netty) don't have this
> failure mode.
> 

Hi Ryan

One area which is almost definitively prone to corruption in case of
OOM is the connection pool. There can be all sorts of resource leaks if
a lease or release operation terminates unexpectedly.


I also seriously doubt that even basic Java collections can preserve
internal consistency in case of OOM.

Anyway the insane exception handling is one of the reasons why I
_personally_ would not touch anything Netty based with a barge poll.

Oleg 



> On Fri, Jun 26, 2020 at 2:55 AM Oleg Kalnichevski <[email protected]>
> wrote:
> 
> > On Thu, 2020-06-25 at 16:17 -0700, xzer wrote:
> > > Hi,
> > > 
> > > I spent some time in the past week to read the code, then I found
> > > several points:
> > > 
> > > # An Error in FutureCallback or AsyncResponseConsumer will stop
> > > the
> > > IOReactorWorker as there is not Error catch
> > > 
> > > This can be easily reproduced and confirmed in the code, thus I
> > > am
> > > not
> > > providing the reproduce code, but I can submit it later if
> > > required.
> > > 
> > > # suspicion on current Exception handling
> > > 
> > > The main event loop is located at SingleCoreIOReactor#doExecute,
> > > the
> > > main process is like following:
> > > 
> > > 
> > 
> > 
https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111
> > > 
> > > while(...){
> > >   closePendingChannels();
> > >   processEvents();
> > >   validateActiveChannels();
> > >   processClosedSessions();
> > >   processPendingChannels();
> > >   processPendingConnectionRequests();
> > > }
> > > 
> > > I noticed that there is no exception handling at the first layer
> > > of
> > > the event loop, and inside the process methods, the Exception
> > > handling
> > > is performed on important points but not covering all execution
> > > paths.
> > > I tried several days to generate an Exception out of the existing
> > > Exception handling coverage but failed, but I am still not very
> > > confident on the possibility of uncaught Exceptions.
> > > 
> > > I don't know the reason why there is no explicit Exception
> > > handling
> > > inside the event loop to guard the loop, and am also curious
> > > about
> > > why
> > > the Error cannot be caught too inside the event loop to keep the
> > > IOReactor running.
> > > 
> > > I assume that the Errors and not covered Exceptions may cause
> > > inconsistency on the IO reactor status (not sure), thus I am
> > > proposing
> > > a robust wrapper on the IO reactor so that we can abandon a
> > > stopped
> > > IO
> > > reactor and create a new one to replace it.
> > > 
> > > Which wrapper, let me call it RobustReactor, will be extended
> > > from
> > > SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we
> > > want
> > > to handle the SingleCoreListeningIOReactor too. RobustReactor
> > > will
> > > accept a SingleCoreIOReactor supplier by constructor. thus it
> > > will be
> > > transparent to IOReactorWorker. RobustReactor will override the
> > > execute to catch all Throwable and recreate the IOReactor
> > > instance.
> > > 
> > > public RobustReactor(Supplier<SingleCoreIOReactor>
> > > reactorSupplier){}
> > > 
> > > public void execute(){
> > >   try{
> > >       this.delegatee.execute();
> > >   } catch (Throwable t){
> > >       safeclose(this.delegatee);
> > >       this.delegatee = supplier.get();
> > >   }
> > > }
> > > 
> > > I am pursuing the suggestion for the possible better approaches.
> > > 
> > 
> > I personally think that catching `Error`s is a _terribly_ _bad_
> > idea.
> > It does not actually make things any more robust but makes damn
> > sure
> > the i/o layer will end up in a corrupt state in case something
> > abnormal
> > happens.
> > 
> > I am fine with having one global try-catch block over the i/o
> > reactor
> > as long as it catches `Exception`s instead of `Throwable`s or at
> > the
> > very least re-throws `Error`s by default.
> > 
> > You also need to consider the case when attempt to re-create I/O
> > reactors in case of an OOM condition would likely result in more
> > OOMs
> > or, worse, a corrupt I/O reactor instance.
> > 
> > Cheers
> > 
> > Oleg
> > 
> > 
> > 
> > > Best Regards
> > > Rui Liu
> > > Sr SDE | Amazon IN Core CX Tech
> > > 
> > > xzer <[email protected]> 于2020年6月18日周四 下午2:11写道:
> > > > 
> > > > Hi Oleg,
> > > > 
> > > > (Thanks Ryan helped me forwarding the mail)
> > > > 
> > > > Thanks for your reply, I had a glance of current 5.0 codebase,
> > > > looks
> > > > like the way of exception handling has been changed and
> > > > IOReactorExceptionHandler is not required any more, which is a
> > > > pretty
> > > > good change.
> > > > 
> > > > I agree with you it is not likely a problem to the library
> > > > itself
> > > > as
> > > > there is no more alternatives in hand as I mentioned in my
> > > > first
> > > > mail.
> > > > The only problem is that we left the robustness challenge to
> > > > the
> > > > users
> > > > and most users may even not realize it, the other ones who
> > > > realized
> > > > this issue then will be implementing their own recovery
> > > > mechanism
> > > > to
> > > > guard the client instance availability.
> > > > 
> > > > I will take some to learn the current implementation and see if
> > > > there
> > > > is anything we should do more, as you said, the error handling
> > > > may
> > > > still be a challenge.
> > > > 
> > > > Best Regards
> > > > Rui Liu
> > > > Sr SDE | Amazon IN Core CX Tech
> > > > 
> > > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道:
> > > > > 
> > > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote:
> > > > > > Oleg, is there anything else we should know about the
> > > > > > details
> > > > > > of this
> > > > > > problem on the 4.x line? It seems like there's more going
> > > > > > on
> > > > > > here
> > > > > > than just
> > > > > > a missing `catch` block.
> > > > > > 
> > > > > 
> > > > > I would not really define it as a problem. It is just a
> > > > > fundamental
> > > > > principle that is is generally better to terminate i/o
> > > > > endpoints
> > > > > in
> > > > > case of an unexpected condition the framework does not know
> > > > > how
> > > > > to
> > > > > gracefully recover from than leaving them running in a half-
> > > > > broken or
> > > > > inconsistent state. If you want to keep endpoints running in
> > > > > case
> > > > > of a
> > > > > unexpected runtime exception in the application layer, so be
> > > > > it,
> > > > > but
> > > > > just apply that it consistently across the entire i/o and
> > > > > protocol
> > > > > layers.
> > > > > 
> > > > > Oleg
> > > > > 
> > > > > 
> > > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > > 
> > > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote:
> > > > > > > > Forwarding this along, since it's not making it to the
> > > > > > > > list
> > > > > > > > for
> > > > > > > > some
> > > > > > > > reason.
> > > > > > > > 
> > > > > > > > ---------- Forwarded message ---------
> > > > > > > > From: xzer <[email protected]>
> > > > > > > > Date: 2020/6/17/ 10:59PM
> > > > > > > > Subject: About HttpAsyncClient exception handling
> > > > > > > > mechanism
> > > > > > > > To: <[email protected]>
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > I am writing this mail to pursue the possible
> > > > > > > > improvement
> > > > > > > > on the
> > > > > > > > current
> > > > > > > > HttpAsyncClient exception handling mechanism.
> > > > > > > > 
> > > > > > > > We observed many service failures inside our company
> > > > > > > > with
> > > > > > > > the
> > > > > > > > error
> > > > > > > > of “I/O
> > > > > > > > reactor status: STOPPED” when using Apache
> > > > > > > > HttpAsyncClient,
> > > > > > > > we
> > > > > > > > also
> > > > > > > > know
> > > > > > > > the reason is because the IOReactorExceptionHandler is
> > > > > > > > not
> > > > > > > > set
> > > > > > > > correctly.
> > > > > > > > 
> > > > > > > > For reference:
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > 
> > 
https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html
> > > > > > > > 
> > > > > > > > We noticed that there are several things should be
> > > > > > > > taken
> > > > > > > > into
> > > > > > > > consideration
> > > > > > > > as this kind of "mistake" keeps happening:
> > > > > > > > 
> > > > > > > > - There is almost no guidance information about
> > > > > > > > IOReactorExceptionHandler
> > > > > > > > at the official site except a small paragraph in
> > > > > > > > tutorial.
> > > > > > > > The
> > > > > > > > official
> > > > > > > > example of how to customize and configure the client
> > > > > > > > building
> > > > > > > > does
> > > > > > > > not have
> > > > > > > > IOReactorExceptionHandler configured too.
> > > > > > > > 
> > > > > > > > - The example of quick start is using
> > > > > > > > "HttpAsyncClients.createDefault()" to
> > > > > > > > illustrate the out-of-box usage, but the client
> > > > > > > > instance
> > > > > > > > created
> > > > > > > > by
> > > > > > > > which
> > > > > > > > "quick way" does not have IOReactorExceptionHandler
> > > > > > > > configured
> > > > > > > > too.
> > > > > > > > 
> > > > > > > > - Even we configure the IOReactorExceptionHandler to
> > > > > > > > handle
> > > > > > > > the
> > > > > > > > Exceptions,
> > > > > > > > we still have several concerns:
> > > > > > > >   - Is that safe to resume the IO Reactor
> > > > > > > > unconditionally
> > > > > > > > on any
> > > > > > > > Exception?
> > > > > > > >   - It is still impossible to recover the IO Reactor if
> > > > > > > > there is
> > > > > > > > an
> > > > > > > > Error
> > > > > > > > rather than Exception, which is typically happening
> > > > > > > > when
> > > > > > > > the
> > > > > > > > service
> > > > > > > > is
> > > > > > > > under traffic pressure.
> > > > > > > > 
> > > > > > > > For the final point of Exception/Error recovery, we
> > > > > > > > understand
> > > > > > > > that
> > > > > > > > it is
> > > > > > > > difficult to decide how to handle the Exception/Error
> > > > > > > > at
> > > > > > > > library
> > > > > > > > level as
> > > > > > > > we have less knowledge about the runtime context.
> > > > > > > > However,
> > > > > > > > it is
> > > > > > > > a
> > > > > > > > painful
> > > > > > > > task to developers who have to take the robustness into
> > > > > > > > account.
> > > > > > > > We
> > > > > > > > believe
> > > > > > > > that HttpAsyncClient should provide extra robustness
> > > > > > > > mechanism to
> > > > > > > > simplify
> > > > > > > > the usage.
> > > > > > > > 
> > > > > > > > We have a proposal like following:
> > > > > > > > 
> > > > > > > > - The client instance created by
> > > > > > > > "HttpAsyncClients.createDefault()"
> > > > > > > > should
> > > > > > > > have a default IOReactorExceptionHandler configured.
> > > > > > > > 
> > > > > > > > - The guidance information of setting
> > > > > > > > IOReactorExceptionHandler
> > > > > > > > should be
> > > > > > > > added into the example of customizing.
> > > > > > > > 
> > > > > > > > - We also propose a default strategy of Exception
> > > > > > > > handling
> > > > > > > > as:
> > > > > > > > resume
> > > > > > > > on
> > > > > > > > RuntimeException and crash on IOException, which is
> > > > > > > > based
> > > > > > > > on a
> > > > > > > > hypothesis
> > > > > > > > that RuntimeException is usually caused by user code
> > > > > > > > and
> > > > > > > > IOException
> > > > > > > > is
> > > > > > > > more likely suggesting a underlying network
> > > > > > > > communication
> > > > > > > > failure.
> > > > > > > > 
> > > > > > > > - We also propose a mechanism which will check the
> > > > > > > > IOReactor
> > > > > > > > status
> > > > > > > > and
> > > > > > > > then reinitialize the client instance entirely when
> > > > > > > > uncaught
> > > > > > > > Exception/Error happens.
> > > > > > > > 
> > > > > > > > As the proposal still requires polish and discussion,
> > > > > > > > thus
> > > > > > > > we are
> > > > > > > > sending
> > > > > > > > this mail to ask the opinion from you about it.
> > > > > > > > 
> > > > > > > 
> > > > > > > Hi Rui
> > > > > > > 
> > > > > > > I _think_ most of the technical issues have already been
> > > > > > > addressed
> > > > > > > in
> > > > > > > HC 5.0. I am not sure HC 4.x is worth any major
> > > > > > > refactoring
> > > > > > > efforts
> > > > > > > at
> > > > > > > this point but you are certainly welcome to propose a PR
> > > > > > > for
> > > > > > > the
> > > > > > > 4.x
> > > > > > > release branch. The only potentially contentious issue
> > > > > > > might
> > > > > > > be
> > > > > > > handing
> > > > > > > of Errors but I am sure we can work something out.
> > > > > > > 
> > > > > > > Please do however consider upgrading to HC 5.0
> > > > > > > 
> > > > > > > Cheers
> > > > > > > 
> > > > > > > Oleg
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > -------------------------------------------------------
> > > > > > > ----
> > > > > > > ------
> > > > > > > ----
> > > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > > For additional commands, e-mail: [email protected]
> > > > > > > 
> > > > > > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > > 
> > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to