tirkarthi opened a new issue, #35888:
URL: https://github.com/apache/airflow/issues/35888

   ### Apache Airflow version
   
   2.7.3
   
   ### What happened
   
   We are facing an issue using Kubernetes Executor where 
`process_watcher_task` that gets None state and is pushed to `result_queue`. On 
fetching the state from queue in `kubernetes_executor.py` it's passed to 
`_change_state` and if the state is None then state is fetched from database 
which when is also None due to some reason the `TaskInstanceState(state)` 
throws `ValueError` which is caught in the exception and the result is again 
added to the queue causing scheduler to go into infinite loop trying to set 
state. We need to restart the scheduler to make it run. If state is None 
database query too then we shouldn't set the state or to catch `ValueError` 
instead of generic exception handling to not retry by pushing the same result 
to queue. The validation was introduced by this change 
https://github.com/apache/airflow/commit/9556d6d5f611428ac8a3a5891647b720d4498ace#diff-11bb8713bf2f01502e66ffa91136f939cc8445839517187f818f044233414f7eR459
   
   
   
https://github.com/apache/airflow/blob/5d74ffb32095d534866f029d085198bc783d82c2/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py#L453-L465
   
   
https://github.com/apache/airflow/blob/f3ddefccf610833dc8d6012431f372f2af03053c/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py#L379-L393
   
   
https://github.com/apache/airflow/blob/5d74ffb32095d534866f029d085198bc783d82c2/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py#L478-L485
   
   ### What you think should happen instead
   
   scheduler should not retry infinitely
   
   ### How to reproduce
   
   We are not sure of the exact scenario where this reproducible. We tried 
running a task that returns an event which k8s returns None in rare case when 
pod is deleted or killed and also delete the task instance to make sure db 
query also returns None but we are not able to consistently get to the case 
that causes this.
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to