gwind opened a new issue #8564:
URL: https://github.com/apache/airflow/issues/8564
**Description**
Add configable debug settings for delay pod delete when there is a `Error`
state of pods.
**Use case / motivation**
In `apache/airflow:1.10.10` image.
I'm deploy a airflow in k8s, want to use Kubernetes Executor for task excute.
If the pod got Error state, airflow scheduler would delete pod immediately.
So we can not see what happend, pod is deleted in some seconds.
When I add `time.sleep()` in `kubernetes_executor.py:896` , like this:
```python
def _change_state(self, key, state, pod_id, namespace):
if state != State.RUNNING:
if self.kube_config.delete_worker_pods:
for x in range(120):
self.log.info(str(x) + ": sleep 1s for...")
time.sleep(1)
self.kube_scheduler.delete_pod(pod_id, namespace)
self.log.info('Deleted pod: %s in namespace %s', str(key),
str(namespace))
try:
self.running.pop(key)
except KeyError:
self.log.debug('Could not find key: %s', str(key))
self.event_buffer[key] = state
```
When trigger execute manully, I can see pod got `Error` state soon.
```
➜ ~ kubectl get po
NAME READY STATUS
RESTARTS AGE
airflow-564c84ff46-tn5mg 2/2 Running
0 67s
examplebashoperatorrunme0-76fd68aa96d64e8c93c7c87904f3312a 0/1 Error
0 24s
```
Watch pod's log:
```
➜ ~ kubectl logs -f
examplebashoperatorrunme0-76fd68aa96d64e8c93c7c87904f3312a
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 23, in <module>
import argcomplete
ModuleNotFoundError: No module named 'argcomplete'
```
It's a error in container. It's easy to debug now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]