potiuk commented on issue #17507:
URL: https://github.com/apache/airflow/issues/17507#issuecomment-1538876522

   > Note that since (my) patch was applied that fixed the race condition, I 
have occasionally seen this error when the process was killed for another 
reason - for example, we have OS monitors that will kill processes that are 
being bad citizens, or certain times when the task had an unexpected exception 
and died by itself.
   
   Precisely. What you explains is what I suspected. Very rare event that is 
externally triggered.  That's how it looks like from the logs. It actually 
looks like something just killed a bunch of tasks running but the original 
local task jobs have not been killed and then it complained about those "child" 
processes missing there. If that is happening only occasionally as a resut of 
some unbounded killing of processes. I would be for just closing this one. We 
are not able to handle all  the scenarios when somethign randomly kills some 
processes. Airlfow is not a 99.999% available system that is supposed to handle 
absolutely all such situations  - this is extremely costly to develop such 
systems, and there is little incentive to spend a lot of time on perfecting it, 
when there is nice UI and monitoring that can warn in such situations and have 
a human to fix it by re-running the tasks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to