krisfur commented on code in PR #34627:
URL: https://github.com/apache/airflow/pull/34627#discussion_r1338860691
##########
airflow/providers/microsoft/azure/operators/container_instances.py:
##########
@@ -319,6 +319,9 @@ def _monitor_logging(self, resource_group: str, name: str)
-> int:
if state in ["Running", "Terminated", "Succeeded"]:
try:
logs = self._ci_hook.get_logs(resource_group, name)
+ if logs is None:
+ self.log.exception("Container log is broken,
marking as failed.")
Review Comment:
I agree, it is possible that returning None logs is just a symptom of a more
complex failure of getting something from the instance view or beyond? I tried
debugging those bits but fell short of finding anything useful.
The solution in this PR stops the results of this issue from causing an
infinite loop of restarts (which incurs costs if it happens in prod), but I
agree it probably doesn't resolve the underlying logic problem, which I fail to
investigate fully with my limited skills of a data engineer and limited
knowledge of the airflow code.
Another solution (which worked) that I tested is checking if last_state ==
Running and state == Waiting which also catches the problem, but this state
change only happens after those errors from None logs are thrown, so stopping
it at the error point seems better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]