ashb commented on issue #6627: [AIRFLOW-5931] Use os.fork when appropriate to speed up task execution. URL: https://github.com/apache/airflow/pull/6627#issuecomment-557485763 > Afaik it does Executor -> Task -> Rawtask. So with your change it would now do "Executor -> Task -> Rawtask -> New Process"? I.e. it hasn't become part of the executor I assume (that would be a no go). Just verifying. Not quite. The existing flow is: Executor -> exec to spawn new python to run Task "watcher" -> spawn new python to run actual Task My PR changes this to: Executor -> exec to spawn new python to run Task "watcher" -> fork to run actual Task The number of processes in use remains the same -- the only difference is how we create the processes, and wether we have to reload all of python and the airflow modules or not. I am happy that the same semantics and isolation is maintained. We could also look at merging the "watcher" in to the executor -- the main thing the watcher does is set the task to failed if it errors, or kills it if the TaskInstance state is changed externally (i.e. the watcher is what is responsible for sending a term/kill signal to the task when you clear it in the UI). And yes, longer term I also want to stop the workers accessing the DB directly. (Once this is merged/working I plan to fix the Local and Celerey executors to tackle the exec vs fork there too.) I'll try using multiprocessing to do this, > Note that in python 3.8 default mode for the new process is spawn as forking on MacOS might cause crashes because threads are not safe for forking and some system libraries on MacOS run threads Sad panda. Interestingly the bug report seems to say it's been a problem since OSX 10.13, but I haven't noticed a problem on 1.014 with this code.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
