[ 
https://issues.apache.org/jira/browse/AIRFLOW-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695841#comment-16695841
 ] 

ASF GitHub Bot commented on AIRFLOW-3263:
-----------------------------------------

kaxil closed pull request #4108: [AIRFLOW-3263] Ignore exception when 'run' 
tries to kill already killed job
URL: https://github.com/apache/incubator-airflow/pull/4108
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/helpers.py b/airflow/utils/helpers.py
index 3ebd5f75c6..328147c1cf 100644
--- a/airflow/utils/helpers.py
+++ b/airflow/utils/helpers.py
@@ -22,6 +22,8 @@
 from __future__ import print_function
 from __future__ import unicode_literals
 
+import errno
+
 import psutil
 
 from builtins import input
@@ -234,7 +236,15 @@ def on_terminate(p):
     children = parent.children(recursive=True)
     children.append(parent)
 
-    log.info("Sending %s to GPID %s", sig, os.getpgid(pid))
+    try:
+        pg = os.getpgid(pid)
+    except OSError as err:
+        # Skip if not such process - we experience a race and it just 
terminated
+        if err.errno == errno.ESRCH:
+            return
+        raise
+
+    log.info("Sending %s to GPID %s", sig, pg)
     os.killpg(os.getpgid(pid), sig)
 
     gone, alive = psutil.wait_procs(children, timeout=timeout, 
callback=on_terminate)


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CLI 'run' method sometimes exits with error when there is a race on killing 
> airflow job
> ---------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3263
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3263
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: cli
>            Reporter: Jarek Potiuk
>            Assignee: Jarek Potiuk
>            Priority: Minor
>
> Sometimes when you run tasks from command line you get exit code = 1 due to 
> race condition (job runner tries to get process group from the process that 
> has already been terminated in the meantime)
> This results in such exception:
> Traceback (most recent call last):
>  File "/Users/potiuk/.virtualenvs/incubator-airflow/bin/airflow", line 7, in 
> <module>
>  exec(compile(f.read(), __file__, 'exec'))
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/bin/airflow",
>  line 32, in <module>
>  args.func(args)
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/utils/cli.py",
>  line 74, in wrapper
>  return f(*args, **kwargs)
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/bin/cli.py",
>  line 536, in run
>  _run(args, dag, ti)
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/bin/cli.py",
>  line 447, in _run
>  run_job.run()
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/jobs.py",
>  line 203, in run
>  self._execute()
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/jobs.py",
>  line 2666, in _execute
>  self.on_kill()
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/jobs.py",
>  line 2669, in on_kill
>  self.task_runner.terminate()
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/task/task_runner/standard_task_runner.py",
>  line 41, in terminate
>  reap_process_group(self.process.pid, self.log)
>  File 
> "/Users/potiuk/code/google-airflow-breeze/polidea/incubator-airflow/airflow/utils/helpers.py",
>  line 237, in reap_process_group
>  log.info("Sending %s to GPID %s", sig, os.getpgid(pid))
> OSError: [Errno 3] No such process
>  
> I am going to provide a fix shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to