pcolladosoto opened a new pull request #22685: URL: https://github.com/apache/airflow/pull/22685
This PR fixes the cleanup procedures of the `DAGFileProcessor`s being spawned by the `DagFileProcessorManager` instance. Thus, this PR closes #22191. We came across this bug when encountering a timeout triggering the forceful killing of these `DAGFileProcessor`s. We have had somewhat lengthy discussion on an issue (#22191) which we encourage anyone to read for some more insight into the cause, discovery and posterior fix. People on that thread were extremely helpful! 😸 We **have not** included tests because we're unsure of how to test a behaviour such as this one. If pointed in the right direction we would be more than happy to add them. We can say however that we applied these changes on our own production Airflow instance and we haven't encountered the issue ever since. On top of that we would be more than welcome if you made any suggestions to the code: for instance, when cleaning up a dictionary we're iterating over we decided to take note of the problematic `DAGFileProcessor`s to then remove them on a second `for` loop. Our background in programming is much stronger on some other languages and so we feel really uncomfortable pushing Python 'to the limit' in terms of relying on its implementation to make design choices. If there's anything that can be done better by all means say so. Another fertile topic for discussion is how to `wait()` for the processes being killed through the `SIGKILL` signal. This has been brought up by @dlesco on #22191 and we agree with him on adding an optional timeout to the operation to avoid blocking in very bizarre circumstances (which the current solution would do). However, we decided to contribute our initial approach and then iterate on solutions within the PR. Thanks a lot for letting me contribute to a tool with the expertise and size of Airflow: it's truly an honour. Hope to make the merge as seamless as possible 😜 closes: #22191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
