zentol opened a new pull request #8412: [FLINK-12111][tests] Harden 
AbstractTaskManagerProcessFailureRecoveryTest
URL: https://github.com/apache/flink/pull/8412
 
 
   ## What is the purpose of the change
   
   Assortment of changes to improve/harden the 
`AbstractTaskManagerProcessFailureRecoveryTest`
   
   ## Brief change log
   
   * removed unused field
   * no longer sets `taskManagerProcess1` to null so that the process output is 
printed on failure
   * wait until destroyed process has actually shut down
   Prevents theoretical scenarios where the job can finish because the 
destroy() command takes a while to take effect.
   * reduce number of initial TMs to 1,
   The batch test could still succeed (if ExecutionMode == BATCH) even if the 
new TM was never used.
   Reduce the number of initial TMs to 1 so that once that TM crashes all tasks 
MUST be moved to the new TM.
   Doubled number of slots to compensate the loss of a TM.
   * allow 2 restarts
   For some reason this test could fail multiple times, instead of just once.
   
   ## Verifying this change
   
   The issue with the BATCH execution mode could be reproduced easily (just 
skip the start of the third TM), and the change should fix this in an obvious 
way.
   
   The restart fix is basically a shot in the dark + band-aid; ideally we would 
find the underlying cause.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to