ashb commented on code in PR #55767:
URL: https://github.com/apache/airflow/pull/55767#discussion_r2355346941
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -1147,8 +1151,22 @@ def final_state(self):
return self._terminal_state or TaskInstanceState.SUCCESS
if self._exit_code != 0 and self._terminal_state == SERVER_TERMINATED:
return SERVER_TERMINATED
+
+ if self._exit_code != 0 and self._is_signal_retryable() and
self._should_retry:
Review Comment:
```suggestion
if and self._is_signal_retryable() and self._should_retry:
```
We already know the exit code is not 0 from the first condition of this fn
on L1150.
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -1147,8 +1151,22 @@ def final_state(self):
return self._terminal_state or TaskInstanceState.SUCCESS
if self._exit_code != 0 and self._terminal_state == SERVER_TERMINATED:
return SERVER_TERMINATED
+
+ if self._exit_code != 0 and self._is_signal_retryable() and
self._should_retry:
+ return TaskInstanceState.UP_FOR_RETRY
+
return TaskInstanceState.FAILED
+ def _is_signal_retryable(self) -> bool:
+ """Check if the exit code signal can be retried."""
+ if self._exit_code is None:
+ return False
+
Review Comment:
```suggestion
```
Not needed since we never call it with None, and also not needed since the
compare on the next lines will just result in False anyway
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -1147,8 +1151,22 @@ def final_state(self):
return self._terminal_state or TaskInstanceState.SUCCESS
if self._exit_code != 0 and self._terminal_state == SERVER_TERMINATED:
return SERVER_TERMINATED
+
+ if self._exit_code != 0 and self._is_signal_retryable() and
self._should_retry:
+ return TaskInstanceState.UP_FOR_RETRY
Review Comment:
This feels a little bit odd to me here.
My first thought was that we should do this inside the place where we first
notice the signal exit code i.e. inside `_check_subprocess_exit` But I'm not
sure.
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -1147,8 +1151,22 @@ def final_state(self):
return self._terminal_state or TaskInstanceState.SUCCESS
if self._exit_code != 0 and self._terminal_state == SERVER_TERMINATED:
return SERVER_TERMINATED
+
+ if self._exit_code != 0 and self._is_signal_retryable() and
self._should_retry:
+ return TaskInstanceState.UP_FOR_RETRY
+
return TaskInstanceState.FAILED
+ def _is_signal_retryable(self) -> bool:
+ """Check if the exit code signal can be retried."""
+ if self._exit_code is None:
+ return False
+
+ if self._exit_code == -signal.SIGKILL or self._exit_code ==
-signal.SIGTERM:
+ return True
Review Comment:
Why only sigkill and sigterm? I think any signal that kills the task should
be up for a retry -- SIGSEGV (seg fault) for instance could just as likely be
retried.
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -1147,8 +1151,22 @@ def final_state(self):
return self._terminal_state or TaskInstanceState.SUCCESS
if self._exit_code != 0 and self._terminal_state == SERVER_TERMINATED:
return SERVER_TERMINATED
+
+ if self._exit_code != 0 and self._is_signal_retryable() and
self._should_retry:
+ return TaskInstanceState.UP_FOR_RETRY
+
return TaskInstanceState.FAILED
+ def _is_signal_retryable(self) -> bool:
+ """Check if the exit code signal can be retried."""
+ if self._exit_code is None:
+ return False
+
+ if self._exit_code == -signal.SIGKILL or self._exit_code ==
-signal.SIGTERM:
+ return True
+
+ return False
Review Comment:
Also:
```suggestion
return self._exit_code in (-signal.SIGKILL, -signal.SIGTERM)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]