jscheffl commented on code in PR #61778:
URL: https://github.com/apache/airflow/pull/61778#discussion_r2814122409
##########
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/triggers/pod.py:
##########
@@ -183,7 +184,7 @@ async def run(self) -> AsyncIterator[TriggerEvent]:
event = await self._wait_for_container_completion()
yield event
return
- except PodLaunchTimeoutException as e:
+ except (PodLaunchTimeoutException, PodLaunchFailedException) as e:
Review Comment:
Uff, looks like a more complex discussion :-D
So normally (and that was a feature) I expect the PodOperator to terminate
"early" and clean up if `ImagePullBackOff` else it is stuck forever. That
feature was added some months ago.
I would see this contribution as adding another exception, if if
`ImagePullBackOff` but the reason in the text somehow points to a rate limit
(not sure if a good text matchign needs to be added or machine readable error
code?) then we should _not_ fail fast. In this case I'd assume the image will
be available later. Finally will fail in startup_timeout.
But also to say, in my view does not make sense to send back to worker if we
"just wait" for rate limit to resolve and pull complete. That is what a
Triggerer is for. Waiting patiently. Maybe logging some details why.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]