On Thu, 2020-06-25 at 16:17 -0700, xzer wrote: > Hi, > > I spent some time in the past week to read the code, then I found > several points: > > # An Error in FutureCallback or AsyncResponseConsumer will stop the > IOReactorWorker as there is not Error catch > > This can be easily reproduced and confirmed in the code, thus I am > not > providing the reproduce code, but I can submit it later if required. > > # suspicion on current Exception handling > > The main event loop is located at SingleCoreIOReactor#doExecute, the > main process is like following: > > https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111 > > while(...){ > closePendingChannels(); > processEvents(); > validateActiveChannels(); > processClosedSessions(); > processPendingChannels(); > processPendingConnectionRequests(); > } > > I noticed that there is no exception handling at the first layer of > the event loop, and inside the process methods, the Exception > handling > is performed on important points but not covering all execution > paths. > I tried several days to generate an Exception out of the existing > Exception handling coverage but failed, but I am still not very > confident on the possibility of uncaught Exceptions. > > I don't know the reason why there is no explicit Exception handling > inside the event loop to guard the loop, and am also curious about > why > the Error cannot be caught too inside the event loop to keep the > IOReactor running. > > I assume that the Errors and not covered Exceptions may cause > inconsistency on the IO reactor status (not sure), thus I am > proposing > a robust wrapper on the IO reactor so that we can abandon a stopped > IO > reactor and create a new one to replace it. > > Which wrapper, let me call it RobustReactor, will be extended from > SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we want > to handle the SingleCoreListeningIOReactor too. RobustReactor will > accept a SingleCoreIOReactor supplier by constructor. thus it will be > transparent to IOReactorWorker. RobustReactor will override the > execute to catch all Throwable and recreate the IOReactor instance. > > public RobustReactor(Supplier<SingleCoreIOReactor> reactorSupplier){} > > public void execute(){ > try{ > this.delegatee.execute(); > } catch (Throwable t){ > safeclose(this.delegatee); > this.delegatee = supplier.get(); > } > } > > I am pursuing the suggestion for the possible better approaches. >
I personally think that catching `Error`s is a _terribly_ _bad_ idea. It does not actually make things any more robust but makes damn sure the i/o layer will end up in a corrupt state in case something abnormal happens. I am fine with having one global try-catch block over the i/o reactor as long as it catches `Exception`s instead of `Throwable`s or at the very least re-throws `Error`s by default. You also need to consider the case when attempt to re-create I/O reactors in case of an OOM condition would likely result in more OOMs or, worse, a corrupt I/O reactor instance. Cheers Oleg > Best Regards > Rui Liu > Sr SDE | Amazon IN Core CX Tech > > xzer <[email protected]> 于2020年6月18日周四 下午2:11写道: > > > > Hi Oleg, > > > > (Thanks Ryan helped me forwarding the mail) > > > > Thanks for your reply, I had a glance of current 5.0 codebase, > > looks > > like the way of exception handling has been changed and > > IOReactorExceptionHandler is not required any more, which is a > > pretty > > good change. > > > > I agree with you it is not likely a problem to the library itself > > as > > there is no more alternatives in hand as I mentioned in my first > > mail. > > The only problem is that we left the robustness challenge to the > > users > > and most users may even not realize it, the other ones who realized > > this issue then will be implementing their own recovery mechanism > > to > > guard the client instance availability. > > > > I will take some to learn the current implementation and see if > > there > > is anything we should do more, as you said, the error handling may > > still be a challenge. > > > > Best Regards > > Rui Liu > > Sr SDE | Amazon IN Core CX Tech > > > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道: > > > > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote: > > > > Oleg, is there anything else we should know about the details > > > > of this > > > > problem on the 4.x line? It seems like there's more going on > > > > here > > > > than just > > > > a missing `catch` block. > > > > > > > > > > I would not really define it as a problem. It is just a > > > fundamental > > > principle that is is generally better to terminate i/o endpoints > > > in > > > case of an unexpected condition the framework does not know how > > > to > > > gracefully recover from than leaving them running in a half- > > > broken or > > > inconsistent state. If you want to keep endpoints running in case > > > of a > > > unexpected runtime exception in the application layer, so be it, > > > but > > > just apply that it consistently across the entire i/o and > > > protocol > > > layers. > > > > > > Oleg > > > > > > > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski < > > > > [email protected]> > > > > wrote: > > > > > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote: > > > > > > Forwarding this along, since it's not making it to the list > > > > > > for > > > > > > some > > > > > > reason. > > > > > > > > > > > > ---------- Forwarded message --------- > > > > > > From: xzer <[email protected]> > > > > > > Date: 2020/6/17/ 10:59PM > > > > > > Subject: About HttpAsyncClient exception handling mechanism > > > > > > To: <[email protected]> > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > I am writing this mail to pursue the possible improvement > > > > > > on the > > > > > > current > > > > > > HttpAsyncClient exception handling mechanism. > > > > > > > > > > > > We observed many service failures inside our company with > > > > > > the > > > > > > error > > > > > > of “I/O > > > > > > reactor status: STOPPED” when using Apache HttpAsyncClient, > > > > > > we > > > > > > also > > > > > > know > > > > > > the reason is because the IOReactorExceptionHandler is not > > > > > > set > > > > > > correctly. > > > > > > > > > > > > For reference: > > > > > > > > > > > > > > > > > > > > > > https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html > > > > > > > > > > > > We noticed that there are several things should be taken > > > > > > into > > > > > > consideration > > > > > > as this kind of "mistake" keeps happening: > > > > > > > > > > > > - There is almost no guidance information about > > > > > > IOReactorExceptionHandler > > > > > > at the official site except a small paragraph in tutorial. > > > > > > The > > > > > > official > > > > > > example of how to customize and configure the client > > > > > > building > > > > > > does > > > > > > not have > > > > > > IOReactorExceptionHandler configured too. > > > > > > > > > > > > - The example of quick start is using > > > > > > "HttpAsyncClients.createDefault()" to > > > > > > illustrate the out-of-box usage, but the client instance > > > > > > created > > > > > > by > > > > > > which > > > > > > "quick way" does not have IOReactorExceptionHandler > > > > > > configured > > > > > > too. > > > > > > > > > > > > - Even we configure the IOReactorExceptionHandler to handle > > > > > > the > > > > > > Exceptions, > > > > > > we still have several concerns: > > > > > > - Is that safe to resume the IO Reactor unconditionally > > > > > > on any > > > > > > Exception? > > > > > > - It is still impossible to recover the IO Reactor if > > > > > > there is > > > > > > an > > > > > > Error > > > > > > rather than Exception, which is typically happening when > > > > > > the > > > > > > service > > > > > > is > > > > > > under traffic pressure. > > > > > > > > > > > > For the final point of Exception/Error recovery, we > > > > > > understand > > > > > > that > > > > > > it is > > > > > > difficult to decide how to handle the Exception/Error at > > > > > > library > > > > > > level as > > > > > > we have less knowledge about the runtime context. However, > > > > > > it is > > > > > > a > > > > > > painful > > > > > > task to developers who have to take the robustness into > > > > > > account. > > > > > > We > > > > > > believe > > > > > > that HttpAsyncClient should provide extra robustness > > > > > > mechanism to > > > > > > simplify > > > > > > the usage. > > > > > > > > > > > > We have a proposal like following: > > > > > > > > > > > > - The client instance created by > > > > > > "HttpAsyncClients.createDefault()" > > > > > > should > > > > > > have a default IOReactorExceptionHandler configured. > > > > > > > > > > > > - The guidance information of setting > > > > > > IOReactorExceptionHandler > > > > > > should be > > > > > > added into the example of customizing. > > > > > > > > > > > > - We also propose a default strategy of Exception handling > > > > > > as: > > > > > > resume > > > > > > on > > > > > > RuntimeException and crash on IOException, which is based > > > > > > on a > > > > > > hypothesis > > > > > > that RuntimeException is usually caused by user code and > > > > > > IOException > > > > > > is > > > > > > more likely suggesting a underlying network communication > > > > > > failure. > > > > > > > > > > > > - We also propose a mechanism which will check the > > > > > > IOReactor > > > > > > status > > > > > > and > > > > > > then reinitialize the client instance entirely when > > > > > > uncaught > > > > > > Exception/Error happens. > > > > > > > > > > > > As the proposal still requires polish and discussion, thus > > > > > > we are > > > > > > sending > > > > > > this mail to ask the opinion from you about it. > > > > > > > > > > > > > > > > Hi Rui > > > > > > > > > > I _think_ most of the technical issues have already been > > > > > addressed > > > > > in > > > > > HC 5.0. I am not sure HC 4.x is worth any major refactoring > > > > > efforts > > > > > at > > > > > this point but you are certainly welcome to propose a PR for > > > > > the > > > > > 4.x > > > > > release branch. The only potentially contentious issue might > > > > > be > > > > > handing > > > > > of Errors but I am sure we can work something out. > > > > > > > > > > Please do however consider upgrading to HC 5.0 > > > > > > > > > > Cheers > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > > ----------------------------------------------------------- > > > > > ------ > > > > > ---- > > > > > To unsubscribe, e-mail: [email protected] > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
