Hi, I spent some time in the past week to read the code, then I found several points:
# An Error in FutureCallback or AsyncResponseConsumer will stop the IOReactorWorker as there is not Error catch This can be easily reproduced and confirmed in the code, thus I am not providing the reproduce code, but I can submit it later if required. # suspicion on current Exception handling The main event loop is located at SingleCoreIOReactor#doExecute, the main process is like following: https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111 while(...){ closePendingChannels(); processEvents(); validateActiveChannels(); processClosedSessions(); processPendingChannels(); processPendingConnectionRequests(); } I noticed that there is no exception handling at the first layer of the event loop, and inside the process methods, the Exception handling is performed on important points but not covering all execution paths. I tried several days to generate an Exception out of the existing Exception handling coverage but failed, but I am still not very confident on the possibility of uncaught Exceptions. I don't know the reason why there is no explicit Exception handling inside the event loop to guard the loop, and am also curious about why the Error cannot be caught too inside the event loop to keep the IOReactor running. I assume that the Errors and not covered Exceptions may cause inconsistency on the IO reactor status (not sure), thus I am proposing a robust wrapper on the IO reactor so that we can abandon a stopped IO reactor and create a new one to replace it. Which wrapper, let me call it RobustReactor, will be extended from SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we want to handle the SingleCoreListeningIOReactor too. RobustReactor will accept a SingleCoreIOReactor supplier by constructor. thus it will be transparent to IOReactorWorker. RobustReactor will override the execute to catch all Throwable and recreate the IOReactor instance. public RobustReactor(Supplier<SingleCoreIOReactor> reactorSupplier){} public void execute(){ try{ this.delegatee.execute(); } catch (Throwable t){ safeclose(this.delegatee); this.delegatee = supplier.get(); } } I am pursuing the suggestion for the possible better approaches. Best Regards Rui Liu Sr SDE | Amazon IN Core CX Tech xzer <[email protected]> 于2020年6月18日周四 下午2:11写道: > > Hi Oleg, > > (Thanks Ryan helped me forwarding the mail) > > Thanks for your reply, I had a glance of current 5.0 codebase, looks > like the way of exception handling has been changed and > IOReactorExceptionHandler is not required any more, which is a pretty > good change. > > I agree with you it is not likely a problem to the library itself as > there is no more alternatives in hand as I mentioned in my first mail. > The only problem is that we left the robustness challenge to the users > and most users may even not realize it, the other ones who realized > this issue then will be implementing their own recovery mechanism to > guard the client instance availability. > > I will take some to learn the current implementation and see if there > is anything we should do more, as you said, the error handling may > still be a challenge. > > Best Regards > Rui Liu > Sr SDE | Amazon IN Core CX Tech > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道: > > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote: > > > Oleg, is there anything else we should know about the details of this > > > problem on the 4.x line? It seems like there's more going on here > > > than just > > > a missing `catch` block. > > > > > > > I would not really define it as a problem. It is just a fundamental > > principle that is is generally better to terminate i/o endpoints in > > case of an unexpected condition the framework does not know how to > > gracefully recover from than leaving them running in a half-broken or > > inconsistent state. If you want to keep endpoints running in case of a > > unexpected runtime exception in the application layer, so be it, but > > just apply that it consistently across the entire i/o and protocol > > layers. > > > > Oleg > > > > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski <[email protected]> > > > wrote: > > > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote: > > > > > Forwarding this along, since it's not making it to the list for > > > > > some > > > > > reason. > > > > > > > > > > ---------- Forwarded message --------- > > > > > From: xzer <[email protected]> > > > > > Date: 2020/6/17/ 10:59PM > > > > > Subject: About HttpAsyncClient exception handling mechanism > > > > > To: <[email protected]> > > > > > > > > > > > > > > > Hello, > > > > > > > > > > I am writing this mail to pursue the possible improvement on the > > > > > current > > > > > HttpAsyncClient exception handling mechanism. > > > > > > > > > > We observed many service failures inside our company with the > > > > > error > > > > > of “I/O > > > > > reactor status: STOPPED” when using Apache HttpAsyncClient, we > > > > > also > > > > > know > > > > > the reason is because the IOReactorExceptionHandler is not set > > > > > correctly. > > > > > > > > > > For reference: > > > > > > > > > > > > > > > https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html > > > > > > > > > > We noticed that there are several things should be taken into > > > > > consideration > > > > > as this kind of "mistake" keeps happening: > > > > > > > > > > - There is almost no guidance information about > > > > > IOReactorExceptionHandler > > > > > at the official site except a small paragraph in tutorial. The > > > > > official > > > > > example of how to customize and configure the client building > > > > > does > > > > > not have > > > > > IOReactorExceptionHandler configured too. > > > > > > > > > > - The example of quick start is using > > > > > "HttpAsyncClients.createDefault()" to > > > > > illustrate the out-of-box usage, but the client instance created > > > > > by > > > > > which > > > > > "quick way" does not have IOReactorExceptionHandler configured > > > > > too. > > > > > > > > > > - Even we configure the IOReactorExceptionHandler to handle the > > > > > Exceptions, > > > > > we still have several concerns: > > > > > - Is that safe to resume the IO Reactor unconditionally on any > > > > > Exception? > > > > > - It is still impossible to recover the IO Reactor if there is > > > > > an > > > > > Error > > > > > rather than Exception, which is typically happening when the > > > > > service > > > > > is > > > > > under traffic pressure. > > > > > > > > > > For the final point of Exception/Error recovery, we understand > > > > > that > > > > > it is > > > > > difficult to decide how to handle the Exception/Error at library > > > > > level as > > > > > we have less knowledge about the runtime context. However, it is > > > > > a > > > > > painful > > > > > task to developers who have to take the robustness into account. > > > > > We > > > > > believe > > > > > that HttpAsyncClient should provide extra robustness mechanism to > > > > > simplify > > > > > the usage. > > > > > > > > > > We have a proposal like following: > > > > > > > > > > - The client instance created by > > > > > "HttpAsyncClients.createDefault()" > > > > > should > > > > > have a default IOReactorExceptionHandler configured. > > > > > > > > > > - The guidance information of setting IOReactorExceptionHandler > > > > > should be > > > > > added into the example of customizing. > > > > > > > > > > - We also propose a default strategy of Exception handling as: > > > > > resume > > > > > on > > > > > RuntimeException and crash on IOException, which is based on a > > > > > hypothesis > > > > > that RuntimeException is usually caused by user code and > > > > > IOException > > > > > is > > > > > more likely suggesting a underlying network communication > > > > > failure. > > > > > > > > > > - We also propose a mechanism which will check the IOReactor > > > > > status > > > > > and > > > > > then reinitialize the client instance entirely when uncaught > > > > > Exception/Error happens. > > > > > > > > > > As the proposal still requires polish and discussion, thus we are > > > > > sending > > > > > this mail to ask the opinion from you about it. > > > > > > > > > > > > > Hi Rui > > > > > > > > I _think_ most of the technical issues have already been addressed > > > > in > > > > HC 5.0. I am not sure HC 4.x is worth any major refactoring efforts > > > > at > > > > this point but you are certainly welcome to propose a PR for the > > > > 4.x > > > > release branch. The only potentially contentious issue might be > > > > handing > > > > of Errors but I am sure we can work something out. > > > > > > > > Please do however consider upgrading to HC 5.0 > > > > > > > > Cheers > > > > > > > > Oleg > > > > > > > > > > > > > > > > ----------------------------------------------------------------- > > > > ---- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
