[ 
https://issues.apache.org/jira/browse/AIRFLOW-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518211#comment-16518211
 ] 

Ksenia Stroykova commented on AIRFLOW-972:
------------------------------------------

This is a joblib issue.

In 11 version I can reproduce it with code:

 

 
{code}
from joblib import Parallel, delayed 
import multiprocessing 
from multiprocessing import Pool 
import subprocess 
import signal 

def f(i): 
    return multiprocessing.current_process().pid 

def f3(): 
    print('f3') 
    def signal_handler(signum, frame): 
        raise ValueError("received SIGTERM signal") 
    signal.signal(signal.SIGTERM, signal_handler) 
    for r in Parallel(n_jobs=5, verbose=1000)(delayed(f)(i) for i in 
range(20)): 
        print(r) 
{code}
And when I call f3() sometimes(!!!) I see an exception:
{code}
In [3]: f3()
f3
[Parallel(n_jobs=5)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=5)]: Batch computation too fast (0.0011s.) Setting 
batch_size=374.
[Parallel(n_jobs=5)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=5)]: Done   3 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   4 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   5 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   6 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   7 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   8 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done  18 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done  20 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done  20 out of  20 | elapsed:    0.0s finished
5260
5261
5263
5264
5262
5260
5261
5262
5261
5263
5260
5260
5260
5260
5260
5260
5260
5260
5260
5260

In [4]: f3()
f3
[Parallel(n_jobs=5)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=5)]: Batch computation too fast (0.0013s.) Setting 
batch_size=304.
[Parallel(n_jobs=5)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=5)]: Done   3 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   4 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   5 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   6 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   7 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   8 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done   9 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done  20 out of  20 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=5)]: Done  20 out of  20 | elapsed:    0.0s finished
Process ForkPoolWorker-15:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in 
_bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/opt/conda/lib/python3.6/site-packages/joblib/pool.py", line 364, in get
    rrelease()
  File "<ipython-input-1-f5429385facc>", line 13, in signal_handler
    raise ValueError("received SIGTERM signal")
ValueError: received SIGTERM signal
{code}
 Airflow catches it and kills the task. If task is fast it ends successfully. 
If task is slow airflow tries to kill it and in the end task dies.

 

I also checked master version of joblib and an error seems to be fixed. Will 
check it with airflow.

> Airflow kills subprocesses created by task instances
> ----------------------------------------------------
>
>                 Key: AIRFLOW-972
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-972
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.1
>            Reporter: Richard Moorhead
>            Priority: Minor
>
> We have a task which creates multiple subprocesses via 
> [joblib|https://pythonhosted.org/joblib/parallel.html]; we're noticing that 
> airflow seems to kill the subprocesses prior to their completion. Is there 
> any way around this behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to