ashb commented on issue #6627: [AIRFLOW-5931] Use os.fork when appropriate to 
speed up task execution.
URL: https://github.com/apache/airflow/pull/6627#issuecomment-557485763
 
 
   > Afaik it does Executor -> Task -> Rawtask. So with your change it would 
now do "Executor -> Task -> Rawtask -> New Process"? I.e. it hasn't become part 
of the executor I assume (that would be a no go). Just verifying.
   
   Not quite. The existing flow is:
   
   Executor -> exec to spawn new python to run Task "watcher" -> spawn new 
python to run actual Task
   
   My PR changes this to:
   
   Executor -> exec to spawn new python to run Task "watcher" -> fork to run 
actual Task
   
   The number of processes in use remains the same -- the only difference is 
how we create the processes, and wether we have to reload all of python and the 
airflow modules or not. I am happy that the same semantics and isolation is 
maintained.
   
   We could also look at merging the "watcher" in to the executor -- the main 
thing the watcher does is set the task to failed if it errors, or kills it if 
the TaskInstance state is changed externally (i.e. the watcher is what is 
responsible for sending a term/kill signal to the task when you clear it in the 
UI).
   
   And yes, longer term I also want to stop the workers accessing the DB 
directly.
   
   (Once this is merged/working I plan to fix the Local and Celerey executors 
to tackle the exec vs fork  there too.)
   
   I'll try using multiprocessing to do this, 
   
   > Note that in python 3.8 default mode for the new process is spawn as 
forking on MacOS might cause crashes because threads are not safe for forking 
and some system libraries on MacOS run threads
   
   Sad panda. Interestingly the bug report seems to say it's been a problem 
since OSX 10.13, but I haven't noticed a problem on 1.014 with this code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to