potiuk commented on issue #17507: URL: https://github.com/apache/airflow/issues/17507#issuecomment-1538876522
> Note that since (my) patch was applied that fixed the race condition, I have occasionally seen this error when the process was killed for another reason - for example, we have OS monitors that will kill processes that are being bad citizens, or certain times when the task had an unexpected exception and died by itself. Precisely. What you explains is what I suspected. Very rare event that is externally triggered. That's how it looks like from the logs. It actually looks like something just killed a bunch of tasks running but the original local task jobs have not been killed and then it complained about those "child" processes missing there. If that is happening only occasionally as a resut of some unbounded killing of processes. I would be for just closing this one. We are not able to handle all the scenarios when somethign randomly kills some processes. Airlfow is not a 99.999% available system that is supposed to handle absolutely all such situations - this is extremely costly to develop such systems, and there is little incentive to spend a lot of time on perfecting it, when there is nice UI and monitoring that can warn in such situations and have a human to fix it by re-running the tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
