potiuk commented on issue #35474:
URL: https://github.com/apache/airflow/issues/35474#issuecomment-1801807403

   How about another option. I think we already use (depends on runner - 
couldbe also spawned and cgroups migh be involved - but generally it's the 
default) fork local task process execution - I believe when task is run, there 
is one main process (LocalTaskJob) that watches for the "child" process and 
regularly pings the DB with heartbeat (or so I understand it happens) - while 
the  child process is doing the job. I think that parent processs does not 
actually parse the task to know the timeout (so that's a bit of a problem), but 
we could POTENTIALLY  modify scheduler (that knows the timout from the 
serialized DAG) to pass such timeout to executor (and subsequently to task 
execution) as additional parameter.
   
   Then, assuming that this parent process is not getting into a long running C 
job and does not hang, it would be relatively easy to do task kill escalation - 
the usual  SIGTERM, SIGHUP, SIGKILL dance with SIGKILL ultimately killing even 
most stubborn forked processes. The parent process is not doing much, it merely 
communicates with Airflow DB via heartbeats (as I understanda) and waits for 
the forked process to finish, so chances that this process will hang are slim.
   
   That would be pretty robust solution, I think?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to