itsnotapt opened a new issue, #38003:
URL: https://github.com/apache/airflow/issues/38003

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   2.8.2
   
   ### Operating System
   
   apache/airflow:2.8.2-python3.10
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   There seems to be a 50/50 chance that the correct logs will be returned by 
the pod.
   
   I'm expecting the following:
   ```
   [2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: /opt/airflow
   [2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: Hello world!
   ```
   
   Successful log:
   ```
   2024-03-08, 22:16:28 UTC] {pod.py:1057} INFO - Building pod 
airflow-pod-uvaridfc with labels: {'dag_id': 'example_python_pod', 'task_id': 
'run_pod', 'run_id': 'manual__2024-03-08T221625.9866800000-564be90aa', 
'kubernetes_pod_operator': 'True', 'try_number': '1'}
   [2024-03-08, 22:16:29 UTC] {taskinstance.py:2367} INFO - Pausing task as 
DEFERRED. dag_id=example_python_pod, task_id=run_pod, 
execution_date=20240308T221625, start_date=20240308T221627
   [2024-03-08, 22:16:29 UTC] {local_task_job_runner.py:231} INFO - Task exited 
with return code 100 (task deferral)
   [2024-03-08, 22:18:31 UTC] {taskinstance.py:1979} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: 
example_python_pod.run_pod manual__2024-03-08T22:16:25.986680+00:00 [queued]>
   [2024-03-08, 22:18:31 UTC] {taskinstance.py:1979} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: 
example_python_pod.run_pod manual__2024-03-08T22:16:25.986680+00:00 [queued]>
   [2024-03-08, 22:18:31 UTC] {taskinstance.py:2191} INFO - Resuming after 
deferral
   [2024-03-08, 22:18:31 UTC] {taskinstance.py:2214} INFO - Executing 
<Task(KubernetesPodOperator): run_pod> on 2024-03-08 22:16:25.986680+00:00
   [2024-03-08, 22:18:31 UTC] {standard_task_runner.py:60} INFO - Started 
process 200 to run task
   [2024-03-08, 22:18:31 UTC] {standard_task_runner.py:87} INFO - Running: 
['airflow', 'tasks', 'run', 'example_python_pod', 'run_pod', 
'manual__2024-03-08T22:16:25.986680+00:00', '--job-id', '468', '--raw', 
'--subdir', 'DAGS_FOLDER/example/example_python_pod.py', '--cfg-path', 
'/tmp/tmpov5tx0h_']
   [2024-03-08, 22:18:31 UTC] {standard_task_runner.py:88} INFO - Job 468: 
Subtask run_pod
   [2024-03-08, 22:18:31 UTC] {task_command.py:423} INFO - Running 
<TaskInstance: example_python_pod.run_pod 
manual__2024-03-08T22:16:25.986680+00:00 [running]> on host 
airflow-service-gamedev-worker-0.airflow-service-gamedev-worker.team-ecosec.svc.cluster.local
   [2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: /opt/airflow
   [2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: Hello world!
   [2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: 
   [2024-03-08, 22:18:33 UTC] {pod_manager.py:798} INFO - Running command... if 
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo 
__airflow_xcom_result_empty__; fi
   [2024-03-08, 22:18:33 UTC] {pod_manager.py:798} INFO - Running command... 
kill -s SIGINT 1
   [2024-03-08, 22:18:34 UTC] {pod.py:559} INFO - xcom result file is empty.
   [2024-03-08, 22:18:34 UTC] {pod_manager.py:616} INFO - Pod 
airflow-pod-uvaridfc has phase Running
   [2024-03-08, 22:18:36 UTC] {pod_manager.py:616} INFO - Pod 
airflow-pod-uvaridfc has phase Running
   [2024-03-08, 22:18:38 UTC] {pod.py:914} INFO - Skipping deleting pod: 
airflow-pod-uvaridfc
   [2024-03-08, 22:18:38 UTC] {taskinstance.py:1149} INFO - Marking task as 
SUCCESS. dag_id=example_python_pod, task_id=run_pod, 
execution_date=20240308T221625, start_date=20240308T221627, 
end_date=20240308T221838
   [2024-03-08, 22:18:38 UTC] {local_task_job_runner.py:234} INFO - Task exited 
with return code 0
   [2024-03-08, 22:18:38 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```
   
   Unsuccessful log:
   ```
   [2024-03-08, 22:20:23 UTC] {pod.py:1057} INFO - Building pod 
airflow-pod-4aauulaa with labels: {'dag_id': 'example_python_pod', 'task_id': 
'run_pod', 'run_id': 'manual__2024-03-08T222021.2242180000-5c0bad58f', 
'kubernetes_pod_operator': 'True', 'try_number': '1'}
   [2024-03-08, 22:20:23 UTC] {taskinstance.py:2367} INFO - Pausing task as 
DEFERRED. dag_id=example_python_pod, task_id=run_pod, 
execution_date=20240308T222021, start_date=20240308T222022
   [2024-03-08, 22:20:24 UTC] {local_task_job_runner.py:231} INFO - Task exited 
with return code 100 (task deferral)
   [2024-03-08, 22:22:26 UTC] {taskinstance.py:1979} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: 
example_python_pod.run_pod manual__2024-03-08T22:20:21.224218+00:00 [queued]>
   [2024-03-08, 22:22:26 UTC] {taskinstance.py:1979} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: 
example_python_pod.run_pod manual__2024-03-08T22:20:21.224218+00:00 [queued]>
   [2024-03-08, 22:22:26 UTC] {taskinstance.py:2191} INFO - Resuming after 
deferral
   [2024-03-08, 22:22:26 UTC] {taskinstance.py:2214} INFO - Executing 
<Task(KubernetesPodOperator): run_pod> on 2024-03-08 22:20:21.224218+00:00
   [2024-03-08, 22:22:26 UTC] {standard_task_runner.py:60} INFO - Started 
process 218 to run task
   [2024-03-08, 22:22:26 UTC] {standard_task_runner.py:87} INFO - Running: 
['airflow', 'tasks', 'run', 'example_python_pod', 'run_pod', 
'manual__2024-03-08T22:20:21.224218+00:00', '--job-id', '470', '--raw', 
'--subdir', 'DAGS_FOLDER/example/example_python_pod.py', '--cfg-path', 
'/tmp/tmprocxttfb']
   [2024-03-08, 22:22:26 UTC] {standard_task_runner.py:88} INFO - Job 470: 
Subtask run_pod
   [2024-03-08, 22:22:26 UTC] {task_command.py:423} INFO - Running 
<TaskInstance: example_python_pod.run_pod 
manual__2024-03-08T22:20:21.224218+00:00 [running]> on host 
airflow-service-gamedev-worker-0.airflow-service-gamedev-worker.team-ecosec.svc.cluster.local
   [2024-03-08, 22:22:27 UTC] {pod_manager.py:798} INFO - Running command... if 
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo 
__airflow_xcom_result_empty__; fi
   [2024-03-08, 22:22:28 UTC] {pod_manager.py:798} INFO - Running command... 
kill -s SIGINT 1
   [2024-03-08, 22:22:28 UTC] {pod.py:559} INFO - xcom result file is empty.
   [2024-03-08, 22:22:28 UTC] {pod_manager.py:616} INFO - Pod 
airflow-pod-4aauulaa has phase Running
   [2024-03-08, 22:22:30 UTC] {pod_manager.py:616} INFO - Pod 
airflow-pod-4aauulaa has phase Running
   [2024-03-08, 22:22:32 UTC] {pod.py:914} INFO - Skipping deleting pod: 
airflow-pod-4aauulaa
   [2024-03-08, 22:22:32 UTC] {taskinstance.py:1149} INFO - Marking task as 
SUCCESS. dag_id=example_python_pod, task_id=run_pod, 
execution_date=20240308T222021, start_date=20240308T222022, 
end_date=20240308T222232
   [2024-03-08, 22:22:32 UTC] {local_task_job_runner.py:234} INFO - Task exited 
with return code 0
   [2024-03-08, 22:22:32 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   The example code that is being used:
   
   ```
       KubernetesPodOperator(
           name="airflow-pod",
           task_id="run_pod",
           # forward pod logs back to Airflow for viewing
           get_logs=True,
           # output results from the pod by writing to /airflow/xcom/return.json
           do_xcom_push=True,
           # keep the pod for troubleshooting, a cleanup job will automatically 
remove it later
           on_finish_action="keep_pod",
           # if the pod is likely to run for an extended period of time use 
deferrable=True
           deferrable=True,
           # if running within kubernetes cluster vs local
           in_cluster=True,
           # how often to check the pod status
           poll_interval=120,
           # how often to check for logs
           #logging_interval=120,
           # default is 2 minutes, however this might not be enough time to get 
the image and initialize the containers
           startup_timeout_seconds=300,
           cmds=["/bin/bash", "-c", "--"],
           arguments=[
               # "while true; do sleep 1; done;"
               "source /vault/secrets/env-secrets && "
               "PYTHON_PATH=/git/airflow/dags && "
               "python /git/airflow/dags/python_scripts/hello_world.py"
           ],
       )
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to