Victor created AIRFLOW-3378:
-------------------------------

             Summary: KubernetesPodOperator does not delete on timeout failure
                 Key: AIRFLOW-3378
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3378
             Project: Apache Airflow
          Issue Type: Bug
          Components: kubernetes
    Affects Versions: 1.10.1
            Reporter: Victor


I am runnig airflow 1.10.1rc2 on kubernetes with the LocalExecutior and 
KubernetesPodOperator to run pods.

 

When a failure happens because the pod can't be created (see logs below), the 
whole operator fails and if is_delete_operator_pod is set to True, the pod is 
NOT deleted as expected.

 

Logs:
{noformat}
*** Reading local file: 
/airflow/logs/demo/datapipe/2018-11-21T08:29:57.206456+00:00/1.log
[2018-11-21 08:30:02,027] {models.py:1361} INFO - Dependencies all met for 
<TaskInstance: demo.datapipe 2018-11-21T08:29:57.206456+00:00 [queued]>
[2018-11-21 08:30:02,040] {models.py:1361} INFO - Dependencies all met for 
<TaskInstance: demo.datapipe 2018-11-21T08:29:57.206456+00:00 [queued]>
[2018-11-21 08:30:02,041] {models.py:1573} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2018-11-21 08:30:02,066] {models.py:1595} INFO - Executing 
<Task(KubernetesPodOperator): datapipe> on 2018-11-21T08:29:57.206456+00:00
[2018-11-21 08:30:02,067] {base_task_runner.py:118} INFO - Running: ['bash', 
'-c', 'airflow run demo datapipe 2018-11-21T08:29:57.206456+00:00 --job_id 4 
--raw -sd /usr/local/airflow/dags_volume/..data/demo.py --cfg_path 
/tmp/tmp5qa5qyhs']
[2018-11-21 08:30:03,065] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe [2018-11-21 08:30:03,065] {settings.py:174} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
[2018-11-21 08:30:03,734] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe [2018-11-21 08:30:03,732] {__init__.py:51} INFO - Using executor 
LocalExecutor
[2018-11-21 08:30:04,038] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe [2018-11-21 08:30:04,038] {models.py:271} INFO - Filling up the DagBag 
from /usr/local/airflow/dags_volume/..data/demo.py
[2018-11-21 08:30:04,590] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe [2018-11-21 08:30:04,590] {cli.py:484} INFO - Running <TaskInstance: 
demo.datapipe 2018-11-21T08:29:57.206456+00:00 [running]> on host 
infra-airflow-6d78c56489-r9trl
[2018-11-21 08:30:04,684] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:04,676] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:05,698] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:05,698] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:06,706] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:06,706] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:07,715] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:07,714] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:08,723] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:08,723] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:09,730] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:09,730] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:10,736] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:10,736] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:11,742] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:11,742] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:12,748] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:12,748] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:13,754] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:13,754] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:14,764] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:14,764] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:15,769] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:15,769] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:16,777] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:16,777] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:17,783] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:17,783] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:18,790] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:18,790] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:19,796] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:19,796] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:20,802] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:20,802] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:21,808] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:21,808] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:22,815] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:22,815] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:23,822] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:23,822] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:24,833] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:24,833] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an 
event of type Pending
[2018-11-21 08:30:24,841] {models.py:1760} ERROR - Pod Launching failed: Pod 
took too long to start
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 123, in execute
    get_logs=self.get_logs)
  File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
 line 86, in run_pod
    raise AirflowException("Pod took too long to start")
airflow.exceptions.AirflowException: Pod took too long to start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, 
in _run_raw_task
    result = task_copy.execute(context=context)
  File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 135, in execute
    raise AirflowException('Pod Launching failed: {error}'.format(error=ex))
airflow.exceptions.AirflowException: Pod Launching failed: Pod took too long to 
start
[2018-11-21 08:30:24,844] {models.py:1791} INFO - Marking task as FAILED.
[2018-11-21 08:30:25,001] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:25,001] {email.py:127} INFO - Sent an alert email to 
['victor.n...@brennus-analytics.com']
[2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe Traceback (most recent call last):
[2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 123, in execute
[2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     get_logs=self.get_logs)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
 line 86, in run_pod
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     raise AirflowException("Pod took too long to start")
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe airflow.exceptions.AirflowException: Pod took too long to start
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe During handling of the above exception, another exception occurred:
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe Traceback (most recent call last):
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/bin/airflow", line 32, in <module>
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     args.func(args)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     return f(*args, **kwargs)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", 
line 490, in run
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     _run(args, dag, ti)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", 
line 406, in _run
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     pool=args.pool,
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", 
line 74, in wrapper
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     return func(*args, **kwargs)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", 
line 1659, in _run_raw_task
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     result = task_copy.execute(context=context)
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe   File 
"/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 135, in execute
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe     raise AirflowException('Pod Launching failed: 
{error}'.format(error=ex))
[2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask 
datapipe airflow.exceptions.AirflowException: Pod Launching failed: Pod took 
too long to start
[2018-11-21 08:30:27,054] {logging_mixin.py:95} INFO - [2018-11-21 
08:30:27,053] {jobs.py:2627} INFO - Task exited with return code 1
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to