dirrao commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1459273598
##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -436,7 +436,7 @@ def sync(self) -> None:
except ApiException as e:
# These codes indicate something is wrong with pod
definition; otherwise we assume pod
# definition is ok, and that retrying may work
- if e.status in (400, 422):
+ if e.status in (400, 403, 404, 422):
Review Comment:
> Do you think that there is a valid use case where we want to keep retry
whenever we get 403 forbidden?
> I give you an example use case I thought about:
> Let's say you have a quota in your namespace, and while trying to run the
task you failed with exceeded quota error.
> Maybe the user will want the executor to retry until resources will be
freed-up.
> Let me know what you think about this use case.
Yes, we thought about that. That's what happening right now. Imagine if all
the tasks are scheduled at the same time, then they will end up retrying for a
long period leading to degraded scheduler performance. Imagine if the airflow
is deployed for multi-tenant use cases. It will lead to other tenants being
impacted by one tenant. It makes sense to retry after 5 mins instead of
continuously.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]