Abhilash Raj pushed to branch master at GNU Mailman / Mailman Core

fffad3c6 by Abhilash Raj at 2019-03-10T23:45:53Z
Catch ChildProcessError for failing CI jobs.

We are seeing a lot of failed CI jobs due to transient ChildProcessError which
in my opinion are caused due to a race condition in our logic of terminating
processes that do not behave properly.

The bad behavior seems to stem from a TOCTTOU bug, where when we check for a
process to have died, they seem to be alive, but when we try to kill them, they
have already died at that point.

I don't know if there is a better way to fix that problem, but, just to make
sure that we don't keep failing CI jobs because of it, we are going to 
catch the
exception raised and just return the function since the job of the function (to
kill the child process) is done.

- - - - -
51f4a2c3 by Abhilash Raj at 2019-03-10T23:45:54Z
Merge branch 'fix-ci-failures' into 'master'

Catch ChildProcessError for failing CI jobs.

See merge request mailman/mailman!473
- - - - -

2 changed files:

- src/mailman/commands/docs/control.rst
- src/mailman/commands/tests/test_cli_control.py


@@ -51,5 +51,6 @@ stops all the child processes too.
     # Clean up.
     >>> from mailman.commands.tests.test_cli_control import (
-    ...     kill_with_extreme_prejudice)
+    ...     kill_with_extreme_prejudice, clean_stale_locks)
     >>> kill_with_extreme_prejudice(pid)
+    >>> clean_stale_locks()

@@ -151,15 +151,32 @@ def kill_with_extreme_prejudice(pid_or_pidfile=None):
             os.kill(pid, signal.SIGKILL)
         until = timedelta(seconds=10) + datetime.now()
         while datetime.now() < until:
-            status = os.waitpid(pid, os.WNOHANG)
-            if status == (0, 0):
-                # The child was reaped.
+            try:
+                os.waitpid(pid, os.WNOHANG)
+            except ChildProcessError:
+                # 2016-03-10 maxking: We are seeing ChildProcessError very
+                # often in CI due to the os.waitpid on L155 above. This is
+                # raised when there is no child process left. We are clearly in
+                # the arena of a race condition where the process was killed
+                # somewhere after we checked and before we tried to wait on
+                # it. TOCTTOU problem.
             print('WARNING: SIGKILL DID NOT EXIT PROCESS!', file=sys.stderr)
+def clean_stale_locks():
+    """Cleanup the master.pid and master.lck file, if they exist."""
+    # If the master process was force-killed during the test suite run, it is
+    # possible that the stale pid file was left. Clean that file up.
+    if os.path.exists(config.PID_FILE):
+        os.unlink(config.PID_FILE)
+    if os.path.exists(config.LOCK_FILE):
+        os.unlink(config.LOCK_FILE)
 class TestControl(unittest.TestCase):
     layer = ConfigLayer
     maxDiff = None

View it on GitLab: 

View it on GitLab: 
You're receiving this email because of your account on gitlab.com.

Mailman-checkins mailing list

Reply via email to