houqp edited a comment on issue #14422: URL: https://github.com/apache/airflow/issues/14422#issuecomment-823028764
@ephraimbuddy after looking more into the code, the initial change I proposed indeed will not work due to wrong assumptions. Here are couple things you need to do to expand it into a working implementation: * `self.task_instance.state = State.FAILED` only sets the state in memory, so you need to commit the change into database as well using `self.task_instance.set_state(State.FAILED)` * before we do the `self.task_instance.state not in State.finished` check, we need to call `self.task_instance.refresh_from_db()` to refresh the state from DB * `self.on_kill` calls `self.task_runner.on_finish()`, which closes `self.task_runner._error_file`. However, in `self.handle_task_exit`, we call `self.task_runner.deserialize_run_error()`, which tries to read `self.task_runner._error_file` again, this will cause `ValueError: seek of closed file` error. We need to skip the `deserialize_run_error()` call in case of sigterm since the sub task process won't have the chance to write to error file anyway. In fact, we can skip `handle_task_exit` entirely and just call `self.task_instance._run_finished_callback` directly within the signal handler. I pushed a simple implementation at https://github.com/apache/airflow/commit/9dea51aec4c45a2fe7d35ccb624fe3b4bdb4dc0f for reference. SIGKILL is not catchable by design just like OOM kills sent by the kernel, that's why I recommended using sigkill in tests to simulate failure cases caused by uncatchable errors like OOM. But the local task job should be able to handle it because it will check the process return code and set the task state accordingly (currently not implemented). If we execute callbacks from within taskinstance subprocess, then those callbacks will never be executed on uncatchable errors like OOM. This is why in my comment in #15172, I mentioned that design will make it impossible to fix https://github.com/apache/airflow/issues/11086. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
