dirrao commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1463426729


##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -434,9 +434,9 @@ def sync(self) -> None:
                     )
                     self.fail(task[0], e)
                 except ApiException as e:
-                    # These codes indicate something is wrong with pod 
definition; otherwise we assume pod
-                    # definition is ok, and that retrying may work
-                    if e.status in (400, 422):
+                    # In case of the below error codes, fail the task and 
honor the task retires.
+                    # Otherwise, go for continuous/infinite retries.
+                    if e.status in (400, 403, 404, 422):

Review Comment:
   Ideally, the scheduler pool slots should be mapped to the namespace quota. 
So, that we don't end up in quota exceeds exception. Even if they deviate over 
time, the customer should be aware of the failures instead of masking/hiding 
the failures. So, that the customer can adjust the pool slots as per the quota. 
I believe this problem falls under resource-aware scheduling and we have to 
think about it holistically across multiple executors. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to