vchiapaikeo commented on PR #37279:
URL: https://github.com/apache/airflow/pull/37279#issuecomment-1938757838

   @pankajastro , @pankajkoti  - this is causing failures for deferrable tasks 
that should complete. I reverted this commit and things work normally after. 
Hoping you can help figure out the issue or revert this before the providers 
push.
   
   Sample DAG:
   
   ```py
   from airflow import DAG
   
   from airflow.providers.google.cloud.operators.kubernetes_engine import (
       GKEStartPodOperator,
   )
   
   
   DEFAULT_TASK_ARGS = {
       "owner": "gcp-data-platform",
       "start_date": "2021-04-20",
       "retries": 0,
       "retry_delay": 60,
   }
   
   with DAG(
       dag_id="test_gke_op",
       schedule_interval="@daily",
       max_active_runs=1,
       max_active_tasks=5,
       catchup=False,
       default_args=DEFAULT_TASK_ARGS,
   ) as dag:
   
       _ = GKEStartPodOperator(
           task_id="whoami",
           name="whoami",
           cmds=["gcloud"],
           arguments=["auth", "list"],
           image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
           project_id="redacted-project-id",
           namespace="airflow-default",
           location="us-central1",
           cluster_name="airflow-gke-cluster",
           service_account_name="default",
           deferrable=True,
           # do_xcom_push=True,
       )
   
       _ = GKEStartPodOperator(
           task_id="fail",
           name="fail",
           cmds=["bash"],
           arguments=["-xc", "sleep 2 && exit 1"],
           image="gcr.io/google.com/cloudsdktool/cloud-sdk:slim",
           project_id="redacted-project-id",
           namespace="airflow-default",
           location="us-central1",
           cluster_name="airflow-gke-cluster",
           service_account_name="default",
           deferrable=True,
           # do_xcom_push=True,
       )
   ```
   
   Expected result (obtained after reverting):
   
   <img width="1158" alt="image" 
src="https://github.com/apache/airflow/assets/9200263/57e89f7a-abe3-412c-b5e9-4d005fa12186";>
   
   
   Unexpected result (after rebasing - whoami should succeed):
   
   <img width="993" alt="image" 
src="https://github.com/apache/airflow/assets/9200263/4060dffd-6496-4e39-98da-5383a1db2407";>
   
   
   whoami logs:
   
   ```
   fcd7bd221fe9
   *** Found local files:
   ***   * 
/root/airflow/logs/dag_id=test_gke_op/run_id=scheduled__2024-02-11T00:00:00+00:00/task_id=whoami/attempt=11.log
   ***   * 
/root/airflow/logs/dag_id=test_gke_op/run_id=scheduled__2024-02-11T00:00:00+00:00/task_id=whoami/attempt=11.log.trigger.40.log
   [2024-02-12, 09:12:23 EST] {taskinstance.py:1994} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=
   [2024-02-12, 09:12:23 EST] {taskinstance.py:1994} INFO - Dependencies all 
met for dep_context=requeueable deps ti=
   [2024-02-12, 09:12:23 EST] {taskinstance.py:2208} INFO - Starting attempt 11 
of 11
   [2024-02-12, 09:12:23 EST] {taskinstance.py:2229} INFO - Executing  on 
2024-02-11 00:00:00+00:00
   [2024-02-12, 09:12:23 EST] {standard_task_runner.py:60} INFO - Started 
process 579 to run task
   [2024-02-12, 09:12:23 EST] {standard_task_runner.py:87} INFO - Running: 
['***', 'tasks', 'run', 'test_gke_op', 'whoami', 
'scheduled__2024-02-11T00:00:00+00:00', '--job-id', '42', '--raw', '--subdir', 
'DAGS_FOLDER/test_deferrable_xcom.py', '--cfg-path', '/tmp/tmpkupoc4wd']
   [2024-02-12, 09:12:23 EST] {standard_task_runner.py:88} INFO - Job 42: 
Subtask whoami
   [2024-02-12, 09:12:23 EST] {task_command.py:423} INFO - Running  on host 
fcd7bd221fe9
   [2024-02-12, 09:12:23 EST] {taskinstance.py:2529} INFO - Exporting env vars: 
AIRFLOW_CTX_DAG_OWNER='gcp-data-platform' AIRFLOW_CTX_DAG_ID='test_gke_op' 
AIRFLOW_CTX_TASK_ID='whoami' 
AIRFLOW_CTX_EXECUTION_DATE='2024-02-11T00:00:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='11' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-02-11T00:00:00+00:00'
   [2024-02-12, 09:12:23 EST] {connection.py:269} WARNING - Connection schemes 
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
   [2024-02-12, 09:12:23 EST] {base.py:83} INFO - Using connection ID 
'google_cloud_default' for task execution.
   [2024-02-12, 09:12:23 EST] {kubernetes_engine.py:289} INFO - Fetching 
cluster (project_id=redacted-project-id, location=us-central1, 
cluster_name=***-gke-cluster)
   [2024-02-12, 09:12:23 EST] {credentials_provider.py:353} INFO - Getting 
connection using `google.auth.default()` since no explicit credentials are 
provided.
   [2024-02-12, 09:12:24 EST] {pod.py:1079} INFO - Building pod whoami-75olgaah 
with labels: {'dag_id': 'test_gke_op', 'task_id': 'whoami', 'run_id': 
'scheduled__2024-02-11T0000000000-2e3e3ab6f', 'kubernetes_pod_operator': 
'True', 'try_number': '11'}
   [2024-02-12, 09:12:24 EST] {connection.py:269} WARNING - Connection schemes 
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
   [2024-02-12, 09:12:24 EST] {base.py:83} INFO - Using connection ID 
'google_cloud_default' for task execution.
   [2024-02-12, 09:12:24 EST] {credentials_provider.py:353} INFO - Getting 
connection using `google.auth.default()` since no explicit credentials are 
provided.
   [2024-02-12, 09:12:25 EST] {taskinstance.py:2382} INFO - Pausing task as 
DEFERRED. dag_id=test_gke_op, task_id=whoami, execution_date=20240211T000000, 
start_date=20240212T141223
   [2024-02-12, 09:12:25 EST] {local_task_job_runner.py:231} INFO - Task exited 
with return code 100 (task deferral)
   [2024-02-12, 09:12:26 EST] {pod.py:160} INFO - Checking pod 
'whoami-75olgaah' in namespace '***-default'.
   [2024-02-12, 09:12:26 EST] {connection.py:269} WARNING - Connection schemes 
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
   [2024-02-12, 09:12:26 EST] {base.py:83} INFO - Using connection ID 
'google_cloud_default' for task execution.
   [2024-02-12, 09:12:29 EST] {triggerer_job_runner.py:604} INFO - Trigger 
test_gke_op/scheduled__2024-02-11T00:00:00+00:00/whoami/-1/11 (ID 15) fired: 
TriggerEvent<{'status': 'done', 'namespace': 'airflow-default', 'pod_name': 
'whoami-75olgaah'}>
   [2024-02-12, 09:12:31 EST] {taskinstance.py:1994} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=
   [2024-02-12, 09:12:31 EST] {taskinstance.py:1994} INFO - Dependencies all 
met for dep_context=requeueable deps ti=
   [2024-02-12, 09:12:31 EST] {taskinstance.py:2206} INFO - Resuming after 
deferral
   [2024-02-12, 09:12:31 EST] {taskinstance.py:2229} INFO - Executing  on 
2024-02-11 00:00:00+00:00
   [2024-02-12, 09:12:31 EST] {standard_task_runner.py:60} INFO - Started 
process 649 to run task
   [2024-02-12, 09:12:31 EST] {standard_task_runner.py:87} INFO - Running: 
['***', 'tasks', 'run', 'test_gke_op', 'whoami', 
'scheduled__2024-02-11T00:00:00+00:00', '--job-id', '43', '--raw', '--subdir', 
'DAGS_FOLDER/test_deferrable_xcom.py', '--cfg-path', '/tmp/tmpbojzr3x_']
   [2024-02-12, 09:12:31 EST] {standard_task_runner.py:88} INFO - Job 43: 
Subtask whoami
   [2024-02-12, 09:12:31 EST] {task_command.py:423} INFO - Running  on host 
fcd7bd221fe9
   [2024-02-12, 09:12:31 EST] {connection.py:269} WARNING - Connection schemes 
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
   [2024-02-12, 09:12:31 EST] {base.py:83} INFO - Using connection ID 
'google_cloud_default' for task execution.
   [2024-02-12, 09:12:31 EST] {credentials_provider.py:353} INFO - Getting 
connection using `google.auth.default()` since no explicit credentials are 
provided.
   [2024-02-12, 09:12:34 EST] {taskinstance.py:2751} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 753, in execute_complete
       event["name"],
   KeyError: 'name'
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/opt/airflow/airflow/models/taskinstance.py", line 446, in 
_execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File "/opt/airflow/airflow/models/taskinstance.py", line 416, in 
_execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
     File "/opt/airflow/airflow/models/baseoperator.py", line 1623, in 
resume_execution
       return execute_callable(context)
     File 
"/opt/airflow/airflow/providers/google/cloud/operators/kubernetes_engine.py", 
line 593, in execute_complete
       return super().execute_complete(context, event, **kwargs)
     File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 786, in execute_complete
       pod = self.pod_manager.await_pod_completion(pod, istio_enabled, 
self.base_container_name)
     File 
"/opt/airflow/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 
611, in await_pod_completion
       remote_pod = self.read_pod(pod)
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
289, in wrapped_f
       return self(f, *args, **kw)
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
379, in __call__
       do = self.iter(retry_state=retry_state)
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
325, in iter
       raise retry_exc.reraise()
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
158, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in 
__get_result
       raise self._exception
     File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 
382, in __call__
       result = fn(*args, **kwargs)
     File 
"/opt/airflow/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 
713, in read_pod
       return self._client.read_namespaced_pod(pod.metadata.name, 
pod.metadata.namespace)
   AttributeError: 'NoneType' object has no attribute 'metadata'
   [2024-02-12, 09:12:34 EST] {taskinstance.py:1166} INFO - Marking task as 
FAILED. dag_id=test_gke_op, task_id=whoami, execution_date=20240211T000000, 
start_date=20240212T141223, end_date=20240212T141234
   [2024-02-12, 09:12:34 EST] {standard_task_runner.py:107} ERROR - Failed to 
execute job 43 for task whoami ('NoneType' object has no attribute 'metadata'; 
649)
   [2024-02-12, 09:12:34 EST] {local_task_job_runner.py:234} INFO - Task exited 
with return code 1
   [2024-02-12, 09:12:34 EST] {taskinstance.py:3332} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to