AutomationDev85 commented on code in PR #58033:
URL: https://github.com/apache/airflow/pull/58033#discussion_r2510431985


##########
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py:
##########
@@ -78,7 +78,8 @@ class PodLaunchFailedException(AirflowException):
 def should_retry_start_pod(exception: BaseException) -> bool:
     """Check if an Exception indicates a transient error and warrants 
retrying."""
     if isinstance(exception, ApiException):
-        return str(exception.status) == "409"
+        # Retry on status code 409 (Conflict) or 429 (Too Many Requests)
+        return str(exception.status) in {"409", "429"}

Review Comment:
   @jscheffl Thank you for the hint—I wasn’t aware of this.
   I’ve reworked the retry logic to make it more generic, so the pod manager 
now retries most requests in a consistent way. I also added usage of the 
Airflow config value, so we no longer have hard-coded max and min values in the 
code.
   I kept the logic for create_pod to retry only on 409 and 429 errors, since 
I’m not sure why this was originally added and I don’t want to break anything 
else.
   
   Looking forward to your feedback!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to