potiuk commented on a change in pull request #21877:
URL: https://github.com/apache/airflow/pull/21877#discussion_r834057818



##########
File path: airflow/jobs/local_task_job.py
##########
@@ -104,11 +104,6 @@ def signal_handler(signum, frame):
         try:
             self.task_runner.start()
 
-            # Unmap the task _after_ it has forked/execed. (This is a bit of a 
kludge, but if we unmap before

Review comment:
       Welll. I think this is not really good idea to remove the fork. Correct 
me if I am wrong, but this will ALWAYS add overhead for starting the process 
and launching a new interpreter, where `fork` was supposed to speed it up 
immensely. And IMHO it has nothing to do with where DAGs are parsed.
   
   While it might not matter for your case where you have huge DAGs to parse 
and "fork" vs. "launch process" overhead is negligible, it will significantly 
change the characteristics and overhead in case the DAGs are small (because the 
whole interpreter will need to be launched in order to run the task. This 
impacts both memory and time and I think we should not remove this optimization.
   
   With fork, we use the fact that forking costs almost nothing - both in terms 
of time and memory. Forked processes use copy-on-write semantics for memory 
used, which means that most of the dynamic memory allocated before forking will 
not be re-allocated again in the fork (unless the memory changes).
   
   Unless I missed something - this is a huge change and we should not remove 
it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to