Abhilash Raj pushed to branch master at GNU Mailman / Mailman Core
Commits: fffad3c6 by Abhilash Raj at 2019-03-10T23:45:53Z Catch ChildProcessError for failing CI jobs. We are seeing a lot of failed CI jobs due to transient ChildProcessError which in my opinion are caused due to a race condition in our logic of terminating processes that do not behave properly. The bad behavior seems to stem from a TOCTTOU bug, where when we check for a process to have died, they seem to be alive, but when we try to kill them, they have already died at that point. I don't know if there is a better way to fix that problem, but, just to make sure that we don't keep failing CI jobs because of it, we are going to catch the exception raised and just return the function since the job of the function (to kill the child process) is done. - - - - - 51f4a2c3 by Abhilash Raj at 2019-03-10T23:45:54Z Merge branch 'fix-ci-failures' into 'master' Catch ChildProcessError for failing CI jobs. See merge request mailman/mailman!473 - - - - - 2 changed files: - src/mailman/commands/docs/control.rst - src/mailman/commands/tests/test_cli_control.py Changes: ===================================== src/mailman/commands/docs/control.rst ===================================== @@ -51,5 +51,6 @@ stops all the child processes too. .. # Clean up. >>> from mailman.commands.tests.test_cli_control import ( - ... kill_with_extreme_prejudice) + ... kill_with_extreme_prejudice, clean_stale_locks) >>> kill_with_extreme_prejudice(pid) + >>> clean_stale_locks() ===================================== src/mailman/commands/tests/test_cli_control.py ===================================== @@ -151,15 +151,32 @@ def kill_with_extreme_prejudice(pid_or_pidfile=None): os.kill(pid, signal.SIGKILL) until = timedelta(seconds=10) + datetime.now() while datetime.now() < until: - status = os.waitpid(pid, os.WNOHANG) - if status == (0, 0): - # The child was reaped. + try: + os.waitpid(pid, os.WNOHANG) + except ChildProcessError: + # 2016-03-10 maxking: We are seeing ChildProcessError very + # often in CI due to the os.waitpid on L155 above. This is + # raised when there is no child process left. We are clearly in + # the arena of a race condition where the process was killed + # somewhere after we checked and before we tried to wait on + # it. TOCTTOU problem. return time.sleep(0.1) else: print('WARNING: SIGKILL DID NOT EXIT PROCESS!', file=sys.stderr) +@public +def clean_stale_locks(): + """Cleanup the master.pid and master.lck file, if they exist.""" + # If the master process was force-killed during the test suite run, it is + # possible that the stale pid file was left. Clean that file up. + if os.path.exists(config.PID_FILE): + os.unlink(config.PID_FILE) + if os.path.exists(config.LOCK_FILE): + os.unlink(config.LOCK_FILE) + + class TestControl(unittest.TestCase): layer = ConfigLayer maxDiff = None View it on GitLab: https://gitlab.com/mailman/mailman/compare/9d67d260a0c00878bfa249bafe472d7207de3f21...51f4a2c3abf119b27c468d992ce2bc605aae74db -- View it on GitLab: https://gitlab.com/mailman/mailman/compare/9d67d260a0c00878bfa249bafe472d7207de3f21...51f4a2c3abf119b27c468d992ce2bc605aae74db You're receiving this email because of your account on gitlab.com.
_______________________________________________ Mailman-checkins mailing list Mailmanemail@example.com Unsubscribe: https://mail.python.org/mailman/options/mailman-checkins/archive%40jab.org