houqp commented on issue #14422:
URL: https://github.com/apache/airflow/issues/14422#issuecomment-823028764


   @ephraimbuddy after looking more into the code, the initial change I 
proposed indeed will not work due to wrong assumptions. Here are couple things 
you need to do to expand it into a working implementation:
   
   * `self.task_instance.state = State.FAILED` only sets the state in memory, 
so you need to commit the change into database as well using 
`self.task_instance.set_state(State.FAILED)`
   * before we do the `self.task_instance.state not in State.finished` check, 
we need to call `self.task_instance.refresh_from_db()` to refresh the state 
from DB
   * `self.on_kill` calls `self.task_runner.on_finish()`, which closes 
`self.task_runner._error_file`. However, in `self.handle_task_exit`, we call 
`self.task_runner.deserialize_run_error()`, which tries to read 
`self.task_runner._error_file` again, this will cause `ValueError: seek of 
closed file` error. We need to skip the `deserialize_run_error()` call in case 
of sigterm since the sub task process won't have the chance to write to error 
file anyway. In fact, we can skip `handle_task_exit` entirely and just call 
`self.task_instance._run_finished_callback` directly within the signal handler.
   
   I pushed a simple implementation at 
https://github.com/apache/airflow/commit/e508300ef7f50434553d9b61e872316649c93df4
 for reference.
   
   SIGKILL is not catchable by design just like OOM kills sent by the kernel, 
that's why I recommended using sigkill in tests to simulate failure cases 
caused by uncatchable errors like OOM. But the local task job should be able to 
handle it because it will check the process return code and set the task state 
accordingly (currently not implemented). If we execute callbacks from within 
taskinstance process, then those callbacks will never be executed on 
uncatchable errors like OOM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to