hussein-awala commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1460539740
##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -434,9 +434,9 @@ def sync(self) -> None:
)
self.fail(task[0], e)
except ApiException as e:
- # These codes indicate something is wrong with pod
definition; otherwise we assume pod
- # definition is ok, and that retrying may work
- if e.status in (400, 422):
+ # In case of the below error codes, fail the task and
honor the task retires.
+ # Otherwise, go for continuous/infinite retries.
+ if e.status in (400, 403, 404, 422):
Review Comment:
> it needs at-least a few mintues for quota to be available
Then we can also support a backoff retry strategy with the max retries
number. I'm just trying to avoid failing an Airflow task because of a temporary
pressure on the cluster, especially if the users use a small number of retries
for the task. Increasing the number of retries to avoid failing the task when
the quota is exceeded will lead to useless attempts when there is a problem in
the task.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]