XComp commented on pull request #15898:
URL: https://github.com/apache/flink/pull/15898#issuecomment-891799420


   > Hi @zentol I just got back from a vacation so I'm looking at this again. 
I'm not sure what we want specification wise with concurrent failure support. I 
can imagine all sorts of things failing concurrently. It seems like maybe 
concurrent failures would be best tested in a more elaborate integration test 
to have a very clear expectation of correct behavior? Perhaps that could be a 
follow up ticket to make this current minimal exception handling landable 
quickly after I address the current feedback?
   
   Hi @bytesandwich , I'm back from vacation so I am able to answer your 
questions.
   Testing concurrent failures should be possible as part of the 
`AdaptiveSchedulerTest`. Similarly to what you've done in 
[AdaptiveSchedulerTest:929](https://github.com/apache/flink/blob/521e19eecadf39a226c5b5be4ed5348485656eab/flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveSchedulerTest.java#L929)
 with one `updateTaskExecutionState` call you should be able to do with two 
calls. The first call will make the `AdaptiveScheduler` switch into restarting 
state. Calling the `updateTaskExecutionState` again would not catch the second 
exception right now. Implementing the exception handling also in the 
`Restarting` state class should solve the issue.
   
   Analogously, that has to be done for cases where the scheduler does not 
switch into `Restarting` state but `Failing`. Does that make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to