zentol opened a new pull request #8412: [FLINK-12111][tests] Harden AbstractTaskManagerProcessFailureRecoveryTest URL: https://github.com/apache/flink/pull/8412 ## What is the purpose of the change Assortment of changes to improve/harden the `AbstractTaskManagerProcessFailureRecoveryTest` ## Brief change log * removed unused field * no longer sets `taskManagerProcess1` to null so that the process output is printed on failure * wait until destroyed process has actually shut down Prevents theoretical scenarios where the job can finish because the destroy() command takes a while to take effect. * reduce number of initial TMs to 1, The batch test could still succeed (if ExecutionMode == BATCH) even if the new TM was never used. Reduce the number of initial TMs to 1 so that once that TM crashes all tasks MUST be moved to the new TM. Doubled number of slots to compensate the loss of a TM. * allow 2 restarts For some reason this test could fail multiple times, instead of just once. ## Verifying this change The issue with the BATCH execution mode could be reproduced easily (just skip the start of the third TM), and the change should fix this in an obvious way. The restart fix is basically a shot in the dark + band-aid; ideally we would find the underlying cause.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
