aiqc opened a new issue, #38298:
URL: https://github.com/apache/airflow/issues/38298

   ### Description
   
   I am running AWS Batch jobs using the BatchOperator.
   
   When I use Spot Instances, sometimes the job gets stuck in a "Runnable" 
status in the AWS Batch Queue. Probably because there are no spot capacity 
instances available. Eventually, the job will show `statusReason`:`UNDETERMINED 
- Batch job is blocked, root cause is undetermined.`
   
   Meanwhile, Airflow shows these stuck jobs in the green "running" state and 
keeps polling them. I have both `retries` and `retry_delay` defined, but I 
don't think they are being triggered because of this running state.
   
   
   
   ### Use case/motivation
   
   Cost savings -- I'd like a bit more retry functionality baked into the 
BatchOperator so that I can run spot instance jobs affordably
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to