Hi,

Actually currently most Exception handling uncovered paths are
initializing and closing resources, I am also not sure if the
IOReactor will be in a good state if we just simply resume the loop
for Exceptions.

OOM is a typical case of Error being thrown, and there is a
paradoxical situation:

If the OOM is caused by continuous traffic pressure, or by consistent
bug, then there will be nothing different between just crashing the
reactor and keep recreating a new reactor when crashing, the final
result is the same: the application will stop service.

But more common case is because of occasionally traffic pressure, or
occasional bad request, or bug in a occasional execution path, for
those occasional failure, recreate a reactor would help the
application recover quickly and avoid unnecessary service
interruption.

Current what we are doing internally (based on the old 4.5) is that,
we are checking the IOReactor status before every request, once we
found the reactor is crashed, we will close the current client
instance and create a new client instance for future requests.

There is another concern that I missed in my first mail, the
Future#get returned by client#execute will hang up when the IOReactor
stopped (on both old 4.5 and current new 5.0), then the user
application will also hang up on such situation, we are now replacing
the Future implementation to force timeout on get without timeout
parameter specified.

Another challenge is that not everyone can come to a consensus on the
robust approach, there are also some teams that do not agree the above
approach we have made inside our company.

I am thinking we may need a pluggable mechanism to allow developers to
implement their own robust policy, and of course with a default
strategy shipped with the library.


Oleg Kalnichevski <[email protected]> 于2020年6月26日周五 上午2:55写道:

>
> On Thu, 2020-06-25 at 16:17 -0700, xzer wrote:
> > Hi,
> >
> > I spent some time in the past week to read the code, then I found
> > several points:
> >
> > # An Error in FutureCallback or AsyncResponseConsumer will stop the
> > IOReactorWorker as there is not Error catch
> >
> > This can be easily reproduced and confirmed in the code, thus I am
> > not
> > providing the reproduce code, but I can submit it later if required.
> >
> > # suspicion on current Exception handling
> >
> > The main event loop is located at SingleCoreIOReactor#doExecute, the
> > main process is like following:
> >
> >
> https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111
> >
> > while(...){
> >   closePendingChannels();
> >   processEvents();
> >   validateActiveChannels();
> >   processClosedSessions();
> >   processPendingChannels();
> >   processPendingConnectionRequests();
> > }
> >
> > I noticed that there is no exception handling at the first layer of
> > the event loop, and inside the process methods, the Exception
> > handling
> > is performed on important points but not covering all execution
> > paths.
> > I tried several days to generate an Exception out of the existing
> > Exception handling coverage but failed, but I am still not very
> > confident on the possibility of uncaught Exceptions.
> >
> > I don't know the reason why there is no explicit Exception handling
> > inside the event loop to guard the loop, and am also curious about
> > why
> > the Error cannot be caught too inside the event loop to keep the
> > IOReactor running.
> >
> > I assume that the Errors and not covered Exceptions may cause
> > inconsistency on the IO reactor status (not sure), thus I am
> > proposing
> > a robust wrapper on the IO reactor so that we can abandon a stopped
> > IO
> > reactor and create a new one to replace it.
> >
> > Which wrapper, let me call it RobustReactor, will be extended from
> > SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we want
> > to handle the SingleCoreListeningIOReactor too. RobustReactor will
> > accept a SingleCoreIOReactor supplier by constructor. thus it will be
> > transparent to IOReactorWorker. RobustReactor will override the
> > execute to catch all Throwable and recreate the IOReactor instance.
> >
> > public RobustReactor(Supplier<SingleCoreIOReactor> reactorSupplier){}
> >
> > public void execute(){
> >   try{
> >       this.delegatee.execute();
> >   } catch (Throwable t){
> >       safeclose(this.delegatee);
> >       this.delegatee = supplier.get();
> >   }
> > }
> >
> > I am pursuing the suggestion for the possible better approaches.
> >
>
> I personally think that catching `Error`s is a _terribly_ _bad_ idea.
> It does not actually make things any more robust but makes damn sure
> the i/o layer will end up in a corrupt state in case something abnormal
> happens.
>
> I am fine with having one global try-catch block over the i/o reactor
> as long as it catches `Exception`s instead of `Throwable`s or at the
> very least re-throws `Error`s by default.
>
> You also need to consider the case when attempt to re-create I/O
> reactors in case of an OOM condition would likely result in more OOMs
> or, worse, a corrupt I/O reactor instance.
>
> Cheers
>
> Oleg
>
>
>
> > Best Regards
> > Rui Liu
> > Sr SDE | Amazon IN Core CX Tech
> >
> > xzer <[email protected]> 于2020年6月18日周四 下午2:11写道:
> > >
> > > Hi Oleg,
> > >
> > > (Thanks Ryan helped me forwarding the mail)
> > >
> > > Thanks for your reply, I had a glance of current 5.0 codebase,
> > > looks
> > > like the way of exception handling has been changed and
> > > IOReactorExceptionHandler is not required any more, which is a
> > > pretty
> > > good change.
> > >
> > > I agree with you it is not likely a problem to the library itself
> > > as
> > > there is no more alternatives in hand as I mentioned in my first
> > > mail.
> > > The only problem is that we left the robustness challenge to the
> > > users
> > > and most users may even not realize it, the other ones who realized
> > > this issue then will be implementing their own recovery mechanism
> > > to
> > > guard the client instance availability.
> > >
> > > I will take some to learn the current implementation and see if
> > > there
> > > is anything we should do more, as you said, the error handling may
> > > still be a challenge.
> > >
> > > Best Regards
> > > Rui Liu
> > > Sr SDE | Amazon IN Core CX Tech
> > >
> > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道:
> > > >
> > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote:
> > > > > Oleg, is there anything else we should know about the details
> > > > > of this
> > > > > problem on the 4.x line? It seems like there's more going on
> > > > > here
> > > > > than just
> > > > > a missing `catch` block.
> > > > >
> > > >
> > > > I would not really define it as a problem. It is just a
> > > > fundamental
> > > > principle that is is generally better to terminate i/o endpoints
> > > > in
> > > > case of an unexpected condition the framework does not know how
> > > > to
> > > > gracefully recover from than leaving them running in a half-
> > > > broken or
> > > > inconsistent state. If you want to keep endpoints running in case
> > > > of a
> > > > unexpected runtime exception in the application layer, so be it,
> > > > but
> > > > just apply that it consistently across the entire i/o and
> > > > protocol
> > > > layers.
> > > >
> > > > Oleg
> > > >
> > > >
> > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote:
> > > > > > > Forwarding this along, since it's not making it to the list
> > > > > > > for
> > > > > > > some
> > > > > > > reason.
> > > > > > >
> > > > > > > ---------- Forwarded message ---------
> > > > > > > From: xzer <[email protected]>
> > > > > > > Date: 2020/6/17/ 10:59PM
> > > > > > > Subject: About HttpAsyncClient exception handling mechanism
> > > > > > > To: <[email protected]>
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am writing this mail to pursue the possible improvement
> > > > > > > on the
> > > > > > > current
> > > > > > > HttpAsyncClient exception handling mechanism.
> > > > > > >
> > > > > > > We observed many service failures inside our company with
> > > > > > > the
> > > > > > > error
> > > > > > > of “I/O
> > > > > > > reactor status: STOPPED” when using Apache HttpAsyncClient,
> > > > > > > we
> > > > > > > also
> > > > > > > know
> > > > > > > the reason is because the IOReactorExceptionHandler is not
> > > > > > > set
> > > > > > > correctly.
> > > > > > >
> > > > > > > For reference:
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html
> > > > > > >
> > > > > > > We noticed that there are several things should be taken
> > > > > > > into
> > > > > > > consideration
> > > > > > > as this kind of "mistake" keeps happening:
> > > > > > >
> > > > > > > - There is almost no guidance information about
> > > > > > > IOReactorExceptionHandler
> > > > > > > at the official site except a small paragraph in tutorial.
> > > > > > > The
> > > > > > > official
> > > > > > > example of how to customize and configure the client
> > > > > > > building
> > > > > > > does
> > > > > > > not have
> > > > > > > IOReactorExceptionHandler configured too.
> > > > > > >
> > > > > > > - The example of quick start is using
> > > > > > > "HttpAsyncClients.createDefault()" to
> > > > > > > illustrate the out-of-box usage, but the client instance
> > > > > > > created
> > > > > > > by
> > > > > > > which
> > > > > > > "quick way" does not have IOReactorExceptionHandler
> > > > > > > configured
> > > > > > > too.
> > > > > > >
> > > > > > > - Even we configure the IOReactorExceptionHandler to handle
> > > > > > > the
> > > > > > > Exceptions,
> > > > > > > we still have several concerns:
> > > > > > >   - Is that safe to resume the IO Reactor unconditionally
> > > > > > > on any
> > > > > > > Exception?
> > > > > > >   - It is still impossible to recover the IO Reactor if
> > > > > > > there is
> > > > > > > an
> > > > > > > Error
> > > > > > > rather than Exception, which is typically happening when
> > > > > > > the
> > > > > > > service
> > > > > > > is
> > > > > > > under traffic pressure.
> > > > > > >
> > > > > > > For the final point of Exception/Error recovery, we
> > > > > > > understand
> > > > > > > that
> > > > > > > it is
> > > > > > > difficult to decide how to handle the Exception/Error at
> > > > > > > library
> > > > > > > level as
> > > > > > > we have less knowledge about the runtime context. However,
> > > > > > > it is
> > > > > > > a
> > > > > > > painful
> > > > > > > task to developers who have to take the robustness into
> > > > > > > account.
> > > > > > > We
> > > > > > > believe
> > > > > > > that HttpAsyncClient should provide extra robustness
> > > > > > > mechanism to
> > > > > > > simplify
> > > > > > > the usage.
> > > > > > >
> > > > > > > We have a proposal like following:
> > > > > > >
> > > > > > > - The client instance created by
> > > > > > > "HttpAsyncClients.createDefault()"
> > > > > > > should
> > > > > > > have a default IOReactorExceptionHandler configured.
> > > > > > >
> > > > > > > - The guidance information of setting
> > > > > > > IOReactorExceptionHandler
> > > > > > > should be
> > > > > > > added into the example of customizing.
> > > > > > >
> > > > > > > - We also propose a default strategy of Exception handling
> > > > > > > as:
> > > > > > > resume
> > > > > > > on
> > > > > > > RuntimeException and crash on IOException, which is based
> > > > > > > on a
> > > > > > > hypothesis
> > > > > > > that RuntimeException is usually caused by user code and
> > > > > > > IOException
> > > > > > > is
> > > > > > > more likely suggesting a underlying network communication
> > > > > > > failure.
> > > > > > >
> > > > > > > - We also propose a mechanism which will check the
> > > > > > > IOReactor
> > > > > > > status
> > > > > > > and
> > > > > > > then reinitialize the client instance entirely when
> > > > > > > uncaught
> > > > > > > Exception/Error happens.
> > > > > > >
> > > > > > > As the proposal still requires polish and discussion, thus
> > > > > > > we are
> > > > > > > sending
> > > > > > > this mail to ask the opinion from you about it.
> > > > > > >
> > > > > >
> > > > > > Hi Rui
> > > > > >
> > > > > > I _think_ most of the technical issues have already been
> > > > > > addressed
> > > > > > in
> > > > > > HC 5.0. I am not sure HC 4.x is worth any major refactoring
> > > > > > efforts
> > > > > > at
> > > > > > this point but you are certainly welcome to propose a PR for
> > > > > > the
> > > > > > 4.x
> > > > > > release branch. The only potentially contentious issue might
> > > > > > be
> > > > > > handing
> > > > > > of Errors but I am sure we can work something out.
> > > > > >
> > > > > > Please do however consider upgrading to HC 5.0
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > Oleg
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----------------------------------------------------------
> > > > > > ------
> > > > > > ----
> > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > For additional commands, e-mail: [email protected]
> > > > > >
> > > > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to