On Fri, 2020-06-26 at 12:20 -0700, xzer wrote: > Hi, > > Actually currently most Exception handling uncovered paths are > initializing and closing resources, I am also not sure if the > IOReactor will be in a good state if we just simply resume the loop > for Exceptions. > > OOM is a typical case of Error being thrown, and there is a > paradoxical situation: > > If the OOM is caused by continuous traffic pressure, or by consistent > bug, then there will be nothing different between just crashing the > reactor and keep recreating a new reactor when crashing, the final > result is the same: the application will stop service. > > But more common case is because of occasionally traffic pressure, or > occasional bad request, or bug in a occasional execution path, for > those occasional failure, recreate a reactor would help the > application recover quickly and avoid unnecessary service > interruption. > > Current what we are doing internally (based on the old 4.5) is that, > we are checking the IOReactor status before every request, once we > found the reactor is crashed, we will close the current client > instance and create a new client instance for future requests. > > There is another concern that I missed in my first mail, the > Future#get returned by client#execute will hang up when the IOReactor > stopped (on both old 4.5 and current new 5.0), then the user > application will also hang up on such situation, we are now replacing > the Future implementation to force timeout on get without timeout > parameter specified. > > Another challenge is that not everyone can come to a consensus on the > robust approach, there are also some teams that do not agree the > above > approach we have made inside our company. > > I am thinking we may need a pluggable mechanism to allow developers > to > implement their own robust policy, and of course with a default > strategy shipped with the library. >
I suppose there will never be a consensus on this particular matter but I am perfectly fine if you propose a pluggable mechanism that would enable the users to define their own policy based on their application requirements and personal preferences. Go for it. Cheers Oleg > > Oleg Kalnichevski <[email protected]> 于2020年6月26日周五 上午2:55写道: > > > > > On Thu, 2020-06-25 at 16:17 -0700, xzer wrote: > > > Hi, > > > > > > I spent some time in the past week to read the code, then I found > > > several points: > > > > > > # An Error in FutureCallback or AsyncResponseConsumer will stop > > > the > > > IOReactorWorker as there is not Error catch > > > > > > This can be easily reproduced and confirmed in the code, thus I > > > am > > > not > > > providing the reproduce code, but I can submit it later if > > > required. > > > > > > # suspicion on current Exception handling > > > > > > The main event loop is located at SingleCoreIOReactor#doExecute, > > > the > > > main process is like following: > > > > > > > > > > https://github.com/apache/httpcomponents-core/blob/74632227c1d2ae4e994f68b0c9e904ebd1dc7406/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L111 > > > > > > while(...){ > > > closePendingChannels(); > > > processEvents(); > > > validateActiveChannels(); > > > processClosedSessions(); > > > processPendingChannels(); > > > processPendingConnectionRequests(); > > > } > > > > > > I noticed that there is no exception handling at the first layer > > > of > > > the event loop, and inside the process methods, the Exception > > > handling > > > is performed on important points but not covering all execution > > > paths. > > > I tried several days to generate an Exception out of the existing > > > Exception handling coverage but failed, but I am still not very > > > confident on the possibility of uncaught Exceptions. > > > > > > I don't know the reason why there is no explicit Exception > > > handling > > > inside the event loop to guard the loop, and am also curious > > > about > > > why > > > the Error cannot be caught too inside the event loop to keep the > > > IOReactor running. > > > > > > I assume that the Errors and not covered Exceptions may cause > > > inconsistency on the IO reactor status (not sure), thus I am > > > proposing > > > a robust wrapper on the IO reactor so that we can abandon a > > > stopped > > > IO > > > reactor and create a new one to replace it. > > > > > > Which wrapper, let me call it RobustReactor, will be extended > > > from > > > SingleCoreIOReactor or AbstractSingleCoreIOReactor depends if we > > > want > > > to handle the SingleCoreListeningIOReactor too. RobustReactor > > > will > > > accept a SingleCoreIOReactor supplier by constructor. thus it > > > will be > > > transparent to IOReactorWorker. RobustReactor will override the > > > execute to catch all Throwable and recreate the IOReactor > > > instance. > > > > > > public RobustReactor(Supplier<SingleCoreIOReactor> > > > reactorSupplier){} > > > > > > public void execute(){ > > > try{ > > > this.delegatee.execute(); > > > } catch (Throwable t){ > > > safeclose(this.delegatee); > > > this.delegatee = supplier.get(); > > > } > > > } > > > > > > I am pursuing the suggestion for the possible better approaches. > > > > > > > I personally think that catching `Error`s is a _terribly_ _bad_ > > idea. > > It does not actually make things any more robust but makes damn > > sure > > the i/o layer will end up in a corrupt state in case something > > abnormal > > happens. > > > > I am fine with having one global try-catch block over the i/o > > reactor > > as long as it catches `Exception`s instead of `Throwable`s or at > > the > > very least re-throws `Error`s by default. > > > > You also need to consider the case when attempt to re-create I/O > > reactors in case of an OOM condition would likely result in more > > OOMs > > or, worse, a corrupt I/O reactor instance. > > > > Cheers > > > > Oleg > > > > > > > > > Best Regards > > > Rui Liu > > > Sr SDE | Amazon IN Core CX Tech > > > > > > xzer <[email protected]> 于2020年6月18日周四 下午2:11写道: > > > > > > > > Hi Oleg, > > > > > > > > (Thanks Ryan helped me forwarding the mail) > > > > > > > > Thanks for your reply, I had a glance of current 5.0 codebase, > > > > looks > > > > like the way of exception handling has been changed and > > > > IOReactorExceptionHandler is not required any more, which is a > > > > pretty > > > > good change. > > > > > > > > I agree with you it is not likely a problem to the library > > > > itself > > > > as > > > > there is no more alternatives in hand as I mentioned in my > > > > first > > > > mail. > > > > The only problem is that we left the robustness challenge to > > > > the > > > > users > > > > and most users may even not realize it, the other ones who > > > > realized > > > > this issue then will be implementing their own recovery > > > > mechanism > > > > to > > > > guard the client instance availability. > > > > > > > > I will take some to learn the current implementation and see if > > > > there > > > > is anything we should do more, as you said, the error handling > > > > may > > > > still be a challenge. > > > > > > > > Best Regards > > > > Rui Liu > > > > Sr SDE | Amazon IN Core CX Tech > > > > > > > > Oleg Kalnichevski <[email protected]> 于2020年6月18日周四 下午1:24写道: > > > > > > > > > > On Thu, 2020-06-18 at 12:55 -0700, Ryan Schmitt wrote: > > > > > > Oleg, is there anything else we should know about the > > > > > > details > > > > > > of this > > > > > > problem on the 4.x line? It seems like there's more going > > > > > > on > > > > > > here > > > > > > than just > > > > > > a missing `catch` block. > > > > > > > > > > > > > > > > I would not really define it as a problem. It is just a > > > > > fundamental > > > > > principle that is is generally better to terminate i/o > > > > > endpoints > > > > > in > > > > > case of an unexpected condition the framework does not know > > > > > how > > > > > to > > > > > gracefully recover from than leaving them running in a half- > > > > > broken or > > > > > inconsistent state. If you want to keep endpoints running in > > > > > case > > > > > of a > > > > > unexpected runtime exception in the application layer, so be > > > > > it, > > > > > but > > > > > just apply that it consistently across the entire i/o and > > > > > protocol > > > > > layers. > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > On Thu, Jun 18, 2020 at 11:48 AM Oleg Kalnichevski < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > On Thu, 2020-06-18 at 10:18 -0700, Ryan Schmitt wrote: > > > > > > > > Forwarding this along, since it's not making it to the > > > > > > > > list > > > > > > > > for > > > > > > > > some > > > > > > > > reason. > > > > > > > > > > > > > > > > ---------- Forwarded message --------- > > > > > > > > From: xzer <[email protected]> > > > > > > > > Date: 2020/6/17/ 10:59PM > > > > > > > > Subject: About HttpAsyncClient exception handling > > > > > > > > mechanism > > > > > > > > To: <[email protected]> > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > I am writing this mail to pursue the possible > > > > > > > > improvement > > > > > > > > on the > > > > > > > > current > > > > > > > > HttpAsyncClient exception handling mechanism. > > > > > > > > > > > > > > > > We observed many service failures inside our company > > > > > > > > with > > > > > > > > the > > > > > > > > error > > > > > > > > of “I/O > > > > > > > > reactor status: STOPPED” when using Apache > > > > > > > > HttpAsyncClient, > > > > > > > > we > > > > > > > > also > > > > > > > > know > > > > > > > > the reason is because the IOReactorExceptionHandler is > > > > > > > > not > > > > > > > > set > > > > > > > > correctly. > > > > > > > > > > > > > > > > For reference: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor.html > > > > > > > > > > > > > > > > We noticed that there are several things should be > > > > > > > > taken > > > > > > > > into > > > > > > > > consideration > > > > > > > > as this kind of "mistake" keeps happening: > > > > > > > > > > > > > > > > - There is almost no guidance information about > > > > > > > > IOReactorExceptionHandler > > > > > > > > at the official site except a small paragraph in > > > > > > > > tutorial. > > > > > > > > The > > > > > > > > official > > > > > > > > example of how to customize and configure the client > > > > > > > > building > > > > > > > > does > > > > > > > > not have > > > > > > > > IOReactorExceptionHandler configured too. > > > > > > > > > > > > > > > > - The example of quick start is using > > > > > > > > "HttpAsyncClients.createDefault()" to > > > > > > > > illustrate the out-of-box usage, but the client > > > > > > > > instance > > > > > > > > created > > > > > > > > by > > > > > > > > which > > > > > > > > "quick way" does not have IOReactorExceptionHandler > > > > > > > > configured > > > > > > > > too. > > > > > > > > > > > > > > > > - Even we configure the IOReactorExceptionHandler to > > > > > > > > handle > > > > > > > > the > > > > > > > > Exceptions, > > > > > > > > we still have several concerns: > > > > > > > > - Is that safe to resume the IO Reactor > > > > > > > > unconditionally > > > > > > > > on any > > > > > > > > Exception? > > > > > > > > - It is still impossible to recover the IO Reactor if > > > > > > > > there is > > > > > > > > an > > > > > > > > Error > > > > > > > > rather than Exception, which is typically happening > > > > > > > > when > > > > > > > > the > > > > > > > > service > > > > > > > > is > > > > > > > > under traffic pressure. > > > > > > > > > > > > > > > > For the final point of Exception/Error recovery, we > > > > > > > > understand > > > > > > > > that > > > > > > > > it is > > > > > > > > difficult to decide how to handle the Exception/Error > > > > > > > > at > > > > > > > > library > > > > > > > > level as > > > > > > > > we have less knowledge about the runtime context. > > > > > > > > However, > > > > > > > > it is > > > > > > > > a > > > > > > > > painful > > > > > > > > task to developers who have to take the robustness into > > > > > > > > account. > > > > > > > > We > > > > > > > > believe > > > > > > > > that HttpAsyncClient should provide extra robustness > > > > > > > > mechanism to > > > > > > > > simplify > > > > > > > > the usage. > > > > > > > > > > > > > > > > We have a proposal like following: > > > > > > > > > > > > > > > > - The client instance created by > > > > > > > > "HttpAsyncClients.createDefault()" > > > > > > > > should > > > > > > > > have a default IOReactorExceptionHandler configured. > > > > > > > > > > > > > > > > - The guidance information of setting > > > > > > > > IOReactorExceptionHandler > > > > > > > > should be > > > > > > > > added into the example of customizing. > > > > > > > > > > > > > > > > - We also propose a default strategy of Exception > > > > > > > > handling > > > > > > > > as: > > > > > > > > resume > > > > > > > > on > > > > > > > > RuntimeException and crash on IOException, which is > > > > > > > > based > > > > > > > > on a > > > > > > > > hypothesis > > > > > > > > that RuntimeException is usually caused by user code > > > > > > > > and > > > > > > > > IOException > > > > > > > > is > > > > > > > > more likely suggesting a underlying network > > > > > > > > communication > > > > > > > > failure. > > > > > > > > > > > > > > > > - We also propose a mechanism which will check the > > > > > > > > IOReactor > > > > > > > > status > > > > > > > > and > > > > > > > > then reinitialize the client instance entirely when > > > > > > > > uncaught > > > > > > > > Exception/Error happens. > > > > > > > > > > > > > > > > As the proposal still requires polish and discussion, > > > > > > > > thus > > > > > > > > we are > > > > > > > > sending > > > > > > > > this mail to ask the opinion from you about it. > > > > > > > > > > > > > > > > > > > > > > Hi Rui > > > > > > > > > > > > > > I _think_ most of the technical issues have already been > > > > > > > addressed > > > > > > > in > > > > > > > HC 5.0. I am not sure HC 4.x is worth any major > > > > > > > refactoring > > > > > > > efforts > > > > > > > at > > > > > > > this point but you are certainly welcome to propose a PR > > > > > > > for > > > > > > > the > > > > > > > 4.x > > > > > > > release branch. The only potentially contentious issue > > > > > > > might > > > > > > > be > > > > > > > handing > > > > > > > of Errors but I am sure we can work something out. > > > > > > > > > > > > > > Please do however consider upgrading to HC 5.0 > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > > > ---- > > > > > > > ------ > > > > > > > ---- > > > > > > > To unsubscribe, e-mail: [email protected] > > > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------- > > > ------ > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
