hussein-awala commented on code in PR #36882:
URL: https://github.com/apache/airflow/pull/36882#discussion_r1468587468
##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:
##########
@@ -434,9 +434,9 @@ def sync(self) -> None:
)
self.fail(task[0], e)
except ApiException as e:
- # These codes indicate something is wrong with pod
definition; otherwise we assume pod
- # definition is ok, and that retrying may work
- if e.status in (400, 422):
+ # In case of the below error codes, fail the task and
honor the task retires.
+ # Otherwise, go for continuous/infinite retries.
+ if e.status in (400, 403, 404, 422):
Review Comment:
> Not necessarily. The pod could be too large to ever be created, and it's
still stuck in the loop forever.
Yeah, it could be too large or not, but with a new config, we can give the
user the ability to decide what to do, I'm fine with a new config similar to
`task_publish_max_retries` even if it's 0 by default, but at least the user
could increase this to a higher value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]