potiuk commented on issue #35474: URL: https://github.com/apache/airflow/issues/35474#issuecomment-1801807403
How about another option. I think we already use (depends on runner - couldbe also spawned and cgroups migh be involved - but generally it's the default) fork local task process execution - I believe when task is run, there is one main process (LocalTaskJob) that watches for the "child" process and regularly pings the DB with heartbeat (or so I understand it happens) - while the child process is doing the job. I think that parent processs does not actually parse the task to know the timeout (so that's a bit of a problem), but we could POTENTIALLY modify scheduler (that knows the timout from the serialized DAG) to pass such timeout to executor (and subsequently to task execution) as additional parameter. Then, assuming that this parent process is not getting into a long running C job and does not hang, it would be relatively easy to do task kill escalation - the usual SIGTERM, SIGHUP, SIGKILL dance with SIGKILL ultimately killing even most stubborn forked processes. The parent process is not doing much, it merely communicates with Airflow DB via heartbeats (as I understanda) and waits for the forked process to finish, so chances that this process will hang are slim. That would be pretty robust solution, I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
