vchiapaikeo commented on PR #37279:
URL: https://github.com/apache/airflow/pull/37279#issuecomment-1938757838
@pankajastro , @pankajkoti - this is causing failures for deferrable tasks
that should complete. I reverted this commit and things work normally after.
Hoping you can help figure out the issue or revert this before the providers
push.
Sample DAG:
```py
from airflow import DAG
from airflow.providers.google.cloud.operators.kubernetes_engine import (
GKEStartPodOperator,
)
DEFAULT_TASK_ARGS = {
"owner": "gcp-data-platform",
"start_date": "2021-04-20",
"retries": 0,
"retry_delay": 60,
}
with DAG(
dag_id="test_gke_op",
schedule_interval="@daily",
max_active_runs=1,
max_active_tasks=5,
catchup=False,
default_args=DEFAULT_TASK_ARGS,
) as dag:
_ = GKEStartPodOperator(
task_id="whoami",
name="whoami",
cmds=["gcloud"],
arguments=["auth", "list"],
image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
project_id="redacted-project-id",
namespace="airflow-default",
location="us-central1",
cluster_name="airflow-gke-cluster",
service_account_name="default",
deferrable=True,
# do_xcom_push=True,
)
_ = GKEStartPodOperator(
task_id="fail",
name="fail",
cmds=["bash"],
arguments=["-xc", "sleep 2 && exit 1"],
image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
project_id="redacted-project-id",
namespace="airflow-default",
location="us-central1",
cluster_name="airflow-gke-cluster",
service_account_name="default",
deferrable=True,
# do_xcom_push=True,
)
```
Expected result (obtained after reverting):
<img width="1158" alt="image"
src="https://github.com/apache/airflow/assets/9200263/57e89f7a-abe3-412c-b5e9-4d005fa12186">
Unexpected result (after rebasing - whoami should succeed):
<img width="993" alt="image"
src="https://github.com/apache/airflow/assets/9200263/4060dffd-6496-4e39-98da-5383a1db2407">
whoami logs:
```
fcd7bd221fe9
*** Found local files:
*** *
/root/airflow/logs/dag_id=test_gke_op/run_id=scheduled__2024-02-11T00:00:00+00:00/task_id=whoami/attempt=11.log
*** *
/root/airflow/logs/dag_id=test_gke_op/run_id=scheduled__2024-02-11T00:00:00+00:00/task_id=whoami/attempt=11.log.trigger.40.log
[2024-02-12, 09:12:23 EST] {taskinstance.py:1994} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=
[2024-02-12, 09:12:23 EST] {taskinstance.py:1994} INFO - Dependencies all
met for dep_context=requeueable deps ti=
[2024-02-12, 09:12:23 EST] {taskinstance.py:2208} INFO - Starting attempt 11
of 11
[2024-02-12, 09:12:23 EST] {taskinstance.py:2229} INFO - Executing on
2024-02-11 00:00:00+00:00
[2024-02-12, 09:12:23 EST] {standard_task_runner.py:60} INFO - Started
process 579 to run task
[2024-02-12, 09:12:23 EST] {standard_task_runner.py:87} INFO - Running:
['***', 'tasks', 'run', 'test_gke_op', 'whoami',
'scheduled__2024-02-11T00:00:00+00:00', '--job-id', '42', '--raw', '--subdir',
'DAGS_FOLDER/test_deferrable_xcom.py', '--cfg-path', '/tmp/tmpkupoc4wd']
[2024-02-12, 09:12:23 EST] {standard_task_runner.py:88} INFO - Job 42:
Subtask whoami
[2024-02-12, 09:12:23 EST] {task_command.py:423} INFO - Running on host
fcd7bd221fe9
[2024-02-12, 09:12:23 EST] {taskinstance.py:2529} INFO - Exporting env vars:
AIRFLOW_CTX_DAG_OWNER='gcp-data-platform' AIRFLOW_CTX_DAG_ID='test_gke_op'
AIRFLOW_CTX_TASK_ID='whoami'
AIRFLOW_CTX_EXECUTION_DATE='2024-02-11T00:00:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='11'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-02-11T00:00:00+00:00'
[2024-02-12, 09:12:23 EST] {connection.py:269} WARNING - Connection schemes
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
[2024-02-12, 09:12:23 EST] {base.py:83} INFO - Using connection ID
'google_cloud_default' for task execution.
[2024-02-12, 09:12:23 EST] {kubernetes_engine.py:289} INFO - Fetching
cluster (project_id=redacted-project-id, location=us-central1,
cluster_name=***-gke-cluster)
[2024-02-12, 09:12:23 EST] {credentials_provider.py:353} INFO - Getting
connection using `google.auth.default()` since no explicit credentials are
provided.
[2024-02-12, 09:12:24 EST] {pod.py:1079} INFO - Building pod whoami-75olgaah
with labels: {'dag_id': 'test_gke_op', 'task_id': 'whoami', 'run_id':
'scheduled__2024-02-11T0000000000-2e3e3ab6f', 'kubernetes_pod_operator':
'True', 'try_number': '11'}
[2024-02-12, 09:12:24 EST] {connection.py:269} WARNING - Connection schemes
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
[2024-02-12, 09:12:24 EST] {base.py:83} INFO - Using connection ID
'google_cloud_default' for task execution.
[2024-02-12, 09:12:24 EST] {credentials_provider.py:353} INFO - Getting
connection using `google.auth.default()` since no explicit credentials are
provided.
[2024-02-12, 09:12:25 EST] {taskinstance.py:2382} INFO - Pausing task as
DEFERRED. dag_id=test_gke_op, task_id=whoami, execution_date=20240211T000000,
start_date=20240212T141223
[2024-02-12, 09:12:25 EST] {local_task_job_runner.py:231} INFO - Task exited
with return code 100 (task deferral)
[2024-02-12, 09:12:26 EST] {pod.py:160} INFO - Checking pod
'whoami-75olgaah' in namespace '***-default'.
[2024-02-12, 09:12:26 EST] {connection.py:269} WARNING - Connection schemes
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
[2024-02-12, 09:12:26 EST] {base.py:83} INFO - Using connection ID
'google_cloud_default' for task execution.
[2024-02-12, 09:12:29 EST] {triggerer_job_runner.py:604} INFO - Trigger
test_gke_op/scheduled__2024-02-11T00:00:00+00:00/whoami/-1/11 (ID 15) fired:
TriggerEvent<{'status': 'done', 'namespace': 'airflow-default', 'pod_name':
'whoami-75olgaah'}>
[2024-02-12, 09:12:31 EST] {taskinstance.py:1994} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=
[2024-02-12, 09:12:31 EST] {taskinstance.py:1994} INFO - Dependencies all
met for dep_context=requeueable deps ti=
[2024-02-12, 09:12:31 EST] {taskinstance.py:2206} INFO - Resuming after
deferral
[2024-02-12, 09:12:31 EST] {taskinstance.py:2229} INFO - Executing on
2024-02-11 00:00:00+00:00
[2024-02-12, 09:12:31 EST] {standard_task_runner.py:60} INFO - Started
process 649 to run task
[2024-02-12, 09:12:31 EST] {standard_task_runner.py:87} INFO - Running:
['***', 'tasks', 'run', 'test_gke_op', 'whoami',
'scheduled__2024-02-11T00:00:00+00:00', '--job-id', '43', '--raw', '--subdir',
'DAGS_FOLDER/test_deferrable_xcom.py', '--cfg-path', '/tmp/tmpbojzr3x_']
[2024-02-12, 09:12:31 EST] {standard_task_runner.py:88} INFO - Job 43:
Subtask whoami
[2024-02-12, 09:12:31 EST] {task_command.py:423} INFO - Running on host
fcd7bd221fe9
[2024-02-12, 09:12:31 EST] {connection.py:269} WARNING - Connection schemes
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
[2024-02-12, 09:12:31 EST] {base.py:83} INFO - Using connection ID
'google_cloud_default' for task execution.
[2024-02-12, 09:12:31 EST] {credentials_provider.py:353} INFO - Getting
connection using `google.auth.default()` since no explicit credentials are
provided.
[2024-02-12, 09:12:34 EST] {taskinstance.py:2751} ERROR - Task failed with
exception
Traceback (most recent call last):
File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py",
line 753, in execute_complete
event["name"],
KeyError: 'name'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/airflow/airflow/models/taskinstance.py", line 446, in
_execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File "/opt/airflow/airflow/models/taskinstance.py", line 416, in
_execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
File "/opt/airflow/airflow/models/baseoperator.py", line 1623, in
resume_execution
return execute_callable(context)
File
"/opt/airflow/airflow/providers/google/cloud/operators/kubernetes_engine.py",
line 593, in execute_complete
return super().execute_complete(context, event, **kwargs)
File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py",
line 786, in execute_complete
pod = self.pod_manager.await_pod_completion(pod, istio_enabled,
self.base_container_name)
File
"/opt/airflow/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line
611, in await_pod_completion
remote_pod = self.read_pod(pod)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in
result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in
__get_result
raise self._exception
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
File
"/opt/airflow/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line
713, in read_pod
return self._client.read_namespaced_pod(pod.metadata.name,
pod.metadata.namespace)
AttributeError: 'NoneType' object has no attribute 'metadata'
[2024-02-12, 09:12:34 EST] {taskinstance.py:1166} INFO - Marking task as
FAILED. dag_id=test_gke_op, task_id=whoami, execution_date=20240211T000000,
start_date=20240212T141223, end_date=20240212T141234
[2024-02-12, 09:12:34 EST] {standard_task_runner.py:107} ERROR - Failed to
execute job 43 for task whoami ('NoneType' object has no attribute 'metadata';
649)
[2024-02-12, 09:12:34 EST] {local_task_job_runner.py:234} INFO - Task exited
with return code 1
[2024-02-12, 09:12:34 EST] {taskinstance.py:3332} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]