itsnotapt opened a new issue, #38003:
URL: https://github.com/apache/airflow/issues/38003
### Apache Airflow Provider(s)
cncf-kubernetes
### Versions of Apache Airflow Providers
_No response_
### Apache Airflow version
2.8.2
### Operating System
apache/airflow:2.8.2-python3.10
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### What happened
There seems to be a 50/50 chance that the correct logs will be returned by
the pod.
I'm expecting the following:
```
[2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: /opt/airflow
[2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: Hello world!
```
Successful log:
```
2024-03-08, 22:16:28 UTC] {pod.py:1057} INFO - Building pod
airflow-pod-uvaridfc with labels: {'dag_id': 'example_python_pod', 'task_id':
'run_pod', 'run_id': 'manual__2024-03-08T221625.9866800000-564be90aa',
'kubernetes_pod_operator': 'True', 'try_number': '1'}
[2024-03-08, 22:16:29 UTC] {taskinstance.py:2367} INFO - Pausing task as
DEFERRED. dag_id=example_python_pod, task_id=run_pod,
execution_date=20240308T221625, start_date=20240308T221627
[2024-03-08, 22:16:29 UTC] {local_task_job_runner.py:231} INFO - Task exited
with return code 100 (task deferral)
[2024-03-08, 22:18:31 UTC] {taskinstance.py:1979} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
example_python_pod.run_pod manual__2024-03-08T22:16:25.986680+00:00 [queued]>
[2024-03-08, 22:18:31 UTC] {taskinstance.py:1979} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance:
example_python_pod.run_pod manual__2024-03-08T22:16:25.986680+00:00 [queued]>
[2024-03-08, 22:18:31 UTC] {taskinstance.py:2191} INFO - Resuming after
deferral
[2024-03-08, 22:18:31 UTC] {taskinstance.py:2214} INFO - Executing
<Task(KubernetesPodOperator): run_pod> on 2024-03-08 22:16:25.986680+00:00
[2024-03-08, 22:18:31 UTC] {standard_task_runner.py:60} INFO - Started
process 200 to run task
[2024-03-08, 22:18:31 UTC] {standard_task_runner.py:87} INFO - Running:
['airflow', 'tasks', 'run', 'example_python_pod', 'run_pod',
'manual__2024-03-08T22:16:25.986680+00:00', '--job-id', '468', '--raw',
'--subdir', 'DAGS_FOLDER/example/example_python_pod.py', '--cfg-path',
'/tmp/tmpov5tx0h_']
[2024-03-08, 22:18:31 UTC] {standard_task_runner.py:88} INFO - Job 468:
Subtask run_pod
[2024-03-08, 22:18:31 UTC] {task_command.py:423} INFO - Running
<TaskInstance: example_python_pod.run_pod
manual__2024-03-08T22:16:25.986680+00:00 [running]> on host
airflow-service-gamedev-worker-0.airflow-service-gamedev-worker.team-ecosec.svc.cluster.local
[2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: /opt/airflow
[2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs: Hello world!
[2024-03-08, 22:18:33 UTC] {pod.py:778} INFO - Container logs:
[2024-03-08, 22:18:33 UTC] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-03-08, 22:18:33 UTC] {pod_manager.py:798} INFO - Running command...
kill -s SIGINT 1
[2024-03-08, 22:18:34 UTC] {pod.py:559} INFO - xcom result file is empty.
[2024-03-08, 22:18:34 UTC] {pod_manager.py:616} INFO - Pod
airflow-pod-uvaridfc has phase Running
[2024-03-08, 22:18:36 UTC] {pod_manager.py:616} INFO - Pod
airflow-pod-uvaridfc has phase Running
[2024-03-08, 22:18:38 UTC] {pod.py:914} INFO - Skipping deleting pod:
airflow-pod-uvaridfc
[2024-03-08, 22:18:38 UTC] {taskinstance.py:1149} INFO - Marking task as
SUCCESS. dag_id=example_python_pod, task_id=run_pod,
execution_date=20240308T221625, start_date=20240308T221627,
end_date=20240308T221838
[2024-03-08, 22:18:38 UTC] {local_task_job_runner.py:234} INFO - Task exited
with return code 0
[2024-03-08, 22:18:38 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
Unsuccessful log:
```
[2024-03-08, 22:20:23 UTC] {pod.py:1057} INFO - Building pod
airflow-pod-4aauulaa with labels: {'dag_id': 'example_python_pod', 'task_id':
'run_pod', 'run_id': 'manual__2024-03-08T222021.2242180000-5c0bad58f',
'kubernetes_pod_operator': 'True', 'try_number': '1'}
[2024-03-08, 22:20:23 UTC] {taskinstance.py:2367} INFO - Pausing task as
DEFERRED. dag_id=example_python_pod, task_id=run_pod,
execution_date=20240308T222021, start_date=20240308T222022
[2024-03-08, 22:20:24 UTC] {local_task_job_runner.py:231} INFO - Task exited
with return code 100 (task deferral)
[2024-03-08, 22:22:26 UTC] {taskinstance.py:1979} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
example_python_pod.run_pod manual__2024-03-08T22:20:21.224218+00:00 [queued]>
[2024-03-08, 22:22:26 UTC] {taskinstance.py:1979} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance:
example_python_pod.run_pod manual__2024-03-08T22:20:21.224218+00:00 [queued]>
[2024-03-08, 22:22:26 UTC] {taskinstance.py:2191} INFO - Resuming after
deferral
[2024-03-08, 22:22:26 UTC] {taskinstance.py:2214} INFO - Executing
<Task(KubernetesPodOperator): run_pod> on 2024-03-08 22:20:21.224218+00:00
[2024-03-08, 22:22:26 UTC] {standard_task_runner.py:60} INFO - Started
process 218 to run task
[2024-03-08, 22:22:26 UTC] {standard_task_runner.py:87} INFO - Running:
['airflow', 'tasks', 'run', 'example_python_pod', 'run_pod',
'manual__2024-03-08T22:20:21.224218+00:00', '--job-id', '470', '--raw',
'--subdir', 'DAGS_FOLDER/example/example_python_pod.py', '--cfg-path',
'/tmp/tmprocxttfb']
[2024-03-08, 22:22:26 UTC] {standard_task_runner.py:88} INFO - Job 470:
Subtask run_pod
[2024-03-08, 22:22:26 UTC] {task_command.py:423} INFO - Running
<TaskInstance: example_python_pod.run_pod
manual__2024-03-08T22:20:21.224218+00:00 [running]> on host
airflow-service-gamedev-worker-0.airflow-service-gamedev-worker.team-ecosec.svc.cluster.local
[2024-03-08, 22:22:27 UTC] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-03-08, 22:22:28 UTC] {pod_manager.py:798} INFO - Running command...
kill -s SIGINT 1
[2024-03-08, 22:22:28 UTC] {pod.py:559} INFO - xcom result file is empty.
[2024-03-08, 22:22:28 UTC] {pod_manager.py:616} INFO - Pod
airflow-pod-4aauulaa has phase Running
[2024-03-08, 22:22:30 UTC] {pod_manager.py:616} INFO - Pod
airflow-pod-4aauulaa has phase Running
[2024-03-08, 22:22:32 UTC] {pod.py:914} INFO - Skipping deleting pod:
airflow-pod-4aauulaa
[2024-03-08, 22:22:32 UTC] {taskinstance.py:1149} INFO - Marking task as
SUCCESS. dag_id=example_python_pod, task_id=run_pod,
execution_date=20240308T222021, start_date=20240308T222022,
end_date=20240308T222232
[2024-03-08, 22:22:32 UTC] {local_task_job_runner.py:234} INFO - Task exited
with return code 0
[2024-03-08, 22:22:32 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
### What you think should happen instead
_No response_
### How to reproduce
The example code that is being used:
```
KubernetesPodOperator(
name="airflow-pod",
task_id="run_pod",
# forward pod logs back to Airflow for viewing
get_logs=True,
# output results from the pod by writing to /airflow/xcom/return.json
do_xcom_push=True,
# keep the pod for troubleshooting, a cleanup job will automatically
remove it later
on_finish_action="keep_pod",
# if the pod is likely to run for an extended period of time use
deferrable=True
deferrable=True,
# if running within kubernetes cluster vs local
in_cluster=True,
# how often to check the pod status
poll_interval=120,
# how often to check for logs
#logging_interval=120,
# default is 2 minutes, however this might not be enough time to get
the image and initialize the containers
startup_timeout_seconds=300,
cmds=["/bin/bash", "-c", "--"],
arguments=[
# "while true; do sleep 1; done;"
"source /vault/secrets/env-secrets && "
"PYTHON_PATH=/git/airflow/dags && "
"python /git/airflow/dags/python_scripts/hello_world.py"
],
)
```
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]