On Thu, 2020-06-25 at 16:17 -0700, xzer wrote:
> Hi,
> 
> I spent some time in the past week to read the code, then I found
> several points:
> 
> # An Error in FutureCallback or AsyncResponseConsumer will stop the
> IOReactorWorker as there is not Error catch
> 
> This can be easily reproduced and confirmed in the code, thus I am
> not
> providing the reproduce code, but I can submit it later if required.
> 
> # suspicion on current Exception handling
> 
> The main event loop is located at SingleCoreIOReactor#doExecute, the
> main process is like following:
> 
> 
https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111
> 
> while(...){
>   closePendingChannels();
>   processEvents();
>   validateActiveChannels();
>   processClosedSessions();
>   processPendingChannels();
>   processPendingConnectionRequests();
> }
> 
> I noticed that there is no exception handling at the first layer of
> the event loop, and inside the process methods, the Exception
> handling
> is performed on important points but not covering all execution
> paths.
> I tried several days to generate an Exception out of the existing
> Exception handling coverage but failed, but I am still not very
> confident on the possibility of uncaught Exceptions.
> 
> I don't know the reason why there is no explicit Exception handling
> inside the event loop to guard the loop, and am also curious about
> why
> the Error cannot be caught too inside the event loop to keep the
> IOReactor running.
> 
> I assume that the Errors and not covered Exceptions may cause
> inconsistency on the IO reactor status (not sure), thus I am
> proposing
> a robust wrapper on the IO reactor so that we can abandon a stopped
> IO
> reactor and create a new one to replace it.
> 
> Which wrapper, let me call it RobustReactor, will be extended from
> SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we want
> to handle the SingleCoreListeningIOReactor too. RobustReactor will
> accept a SingleCoreIOReactor supplier by constructor. thus it will be
> transparent to IOReactorWorker. RobustReactor will override the
> execute to catch all Throwable and recreate the IOReactor instance.
> 
> public RobustReactor(Supplier<SingleCoreIOReactor> reactorSupplier){}
> 
> public void execute(){
>   try{
>       this.delegatee.execute();
>   } catch (Throwable t){
>       safeclose(this.delegatee);
>       this.delegatee = supplier.get();
>   }
> }
> 
> I am pursuing the suggestion for the possible better approaches.
> 

I personally think that catching `Error`s is a _terribly_ _bad_ idea.
It does not actually make things any more robust but makes damn sure
the i/o layer will end up in a corrupt state in case something abnormal
happens. 

I am fine with having one global try-catch block over the i/o reactor
as long as it catches `Exception`s instead of `Throwable`s or at the
very least re-throws `Error`s by default.

You also need to consider the case when attempt to re-create I/O
reactors in case of an OOM condition would likely result in more OOMs
or, worse, a corrupt I/O reactor instance. 

Cheers

Oleg



> Best Regards
> Rui Liu
> Sr SDE | Amazon IN Core CX Tech
> 
> xzer <[email protected]> 于2020年6月18日周四 下午2:11写道:
> > 
> > Hi Oleg,
> > 
> > (Thanks Ryan helped me forwarding the mail)
> > 
> > Thanks for your reply, I had a glance of current 5.0 codebase,
> > looks
> > like the way of exception handling has been changed and
> > IOReactorExceptionHandler is not required any more, which is a
> > pretty
> > good change.
> > 
> > I agree with you it is not likely a problem to the library itself
> > as
> > there is no more alternatives in hand as I mentioned in my first
> > mail.
> > The only problem is that we left the robustness challenge to the
> > users
> > and most users may even not realize it, the other ones who realized
> > this issue then will be implementing their own recovery mechanism
> > to
> > guard the client instance availability.
> > 
> > I will take some to learn the current implementation and see if
> > there
> > is anything we should do more, as you said, the error handling may
> > still be a challenge.
> > 
> > Best Regards
> > Rui Liu
> > Sr SDE | Amazon IN Core CX Tech
> > 
> > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道:
> > > 
> > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote:
> > > > Oleg, is there anything else we should know about the details
> > > > of this
> > > > problem on the 4.x line? It seems like there's more going on
> > > > here
> > > > than just
> > > > a missing `catch` block.
> > > > 
> > > 
> > > I would not really define it as a problem. It is just a
> > > fundamental
> > > principle that is is generally better to terminate i/o endpoints
> > > in
> > > case of an unexpected condition the framework does not know how
> > > to
> > > gracefully recover from than leaving them running in a half-
> > > broken or
> > > inconsistent state. If you want to keep endpoints running in case
> > > of a
> > > unexpected runtime exception in the application layer, so be it,
> > > but
> > > just apply that it consistently across the entire i/o and
> > > protocol
> > > layers.
> > > 
> > > Oleg
> > > 
> > > 
> > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski <
> > > > [email protected]>
> > > > wrote:
> > > > 
> > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote:
> > > > > > Forwarding this along, since it's not making it to the list
> > > > > > for
> > > > > > some
> > > > > > reason.
> > > > > > 
> > > > > > ---------- Forwarded message ---------
> > > > > > From: xzer <[email protected]>
> > > > > > Date: 2020/6/17/ 10:59PM
> > > > > > Subject: About HttpAsyncClient exception handling mechanism
> > > > > > To: <[email protected]>
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > I am writing this mail to pursue the possible improvement
> > > > > > on the
> > > > > > current
> > > > > > HttpAsyncClient exception handling mechanism.
> > > > > > 
> > > > > > We observed many service failures inside our company with
> > > > > > the
> > > > > > error
> > > > > > of “I/O
> > > > > > reactor status: STOPPED” when using Apache HttpAsyncClient,
> > > > > > we
> > > > > > also
> > > > > > know
> > > > > > the reason is because the IOReactorExceptionHandler is not
> > > > > > set
> > > > > > correctly.
> > > > > > 
> > > > > > For reference:
> > > > > > 
> > > > > 
> > > > > 
> > > 
> > > 
https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html
> > > > > > 
> > > > > > We noticed that there are several things should be taken
> > > > > > into
> > > > > > consideration
> > > > > > as this kind of "mistake" keeps happening:
> > > > > > 
> > > > > > - There is almost no guidance information about
> > > > > > IOReactorExceptionHandler
> > > > > > at the official site except a small paragraph in tutorial.
> > > > > > The
> > > > > > official
> > > > > > example of how to customize and configure the client
> > > > > > building
> > > > > > does
> > > > > > not have
> > > > > > IOReactorExceptionHandler configured too.
> > > > > > 
> > > > > > - The example of quick start is using
> > > > > > "HttpAsyncClients.createDefault()" to
> > > > > > illustrate the out-of-box usage, but the client instance
> > > > > > created
> > > > > > by
> > > > > > which
> > > > > > "quick way" does not have IOReactorExceptionHandler
> > > > > > configured
> > > > > > too.
> > > > > > 
> > > > > > - Even we configure the IOReactorExceptionHandler to handle
> > > > > > the
> > > > > > Exceptions,
> > > > > > we still have several concerns:
> > > > > >   - Is that safe to resume the IO Reactor unconditionally
> > > > > > on any
> > > > > > Exception?
> > > > > >   - It is still impossible to recover the IO Reactor if
> > > > > > there is
> > > > > > an
> > > > > > Error
> > > > > > rather than Exception, which is typically happening when
> > > > > > the
> > > > > > service
> > > > > > is
> > > > > > under traffic pressure.
> > > > > > 
> > > > > > For the final point of Exception/Error recovery, we
> > > > > > understand
> > > > > > that
> > > > > > it is
> > > > > > difficult to decide how to handle the Exception/Error at
> > > > > > library
> > > > > > level as
> > > > > > we have less knowledge about the runtime context. However,
> > > > > > it is
> > > > > > a
> > > > > > painful
> > > > > > task to developers who have to take the robustness into
> > > > > > account.
> > > > > > We
> > > > > > believe
> > > > > > that HttpAsyncClient should provide extra robustness
> > > > > > mechanism to
> > > > > > simplify
> > > > > > the usage.
> > > > > > 
> > > > > > We have a proposal like following:
> > > > > > 
> > > > > > - The client instance created by
> > > > > > "HttpAsyncClients.createDefault()"
> > > > > > should
> > > > > > have a default IOReactorExceptionHandler configured.
> > > > > > 
> > > > > > - The guidance information of setting
> > > > > > IOReactorExceptionHandler
> > > > > > should be
> > > > > > added into the example of customizing.
> > > > > > 
> > > > > > - We also propose a default strategy of Exception handling
> > > > > > as:
> > > > > > resume
> > > > > > on
> > > > > > RuntimeException and crash on IOException, which is based
> > > > > > on a
> > > > > > hypothesis
> > > > > > that RuntimeException is usually caused by user code and
> > > > > > IOException
> > > > > > is
> > > > > > more likely suggesting a underlying network communication
> > > > > > failure.
> > > > > > 
> > > > > > - We also propose a mechanism which will check the
> > > > > > IOReactor
> > > > > > status
> > > > > > and
> > > > > > then reinitialize the client instance entirely when
> > > > > > uncaught
> > > > > > Exception/Error happens.
> > > > > > 
> > > > > > As the proposal still requires polish and discussion, thus
> > > > > > we are
> > > > > > sending
> > > > > > this mail to ask the opinion from you about it.
> > > > > > 
> > > > > 
> > > > > Hi Rui
> > > > > 
> > > > > I _think_ most of the technical issues have already been
> > > > > addressed
> > > > > in
> > > > > HC 5.0. I am not sure HC 4.x is worth any major refactoring
> > > > > efforts
> > > > > at
> > > > > this point but you are certainly welcome to propose a PR for
> > > > > the
> > > > > 4.x
> > > > > release branch. The only potentially contentious issue might
> > > > > be
> > > > > handing
> > > > > of Errors but I am sure we can work something out.
> > > > > 
> > > > > Please do however consider upgrading to HC 5.0
> > > > > 
> > > > > Cheers
> > > > > 
> > > > > Oleg
> > > > > 
> > > > > 
> > > > > 
> > > > > -----------------------------------------------------------
> > > > > ------
> > > > > ----
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > > 
> > > > > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to