potiuk commented on a change in pull request #21877:
URL: https://github.com/apache/airflow/pull/21877#discussion_r834057818
##########
File path: airflow/jobs/local_task_job.py
##########
@@ -104,11 +104,6 @@ def signal_handler(signum, frame):
try:
self.task_runner.start()
- # Unmap the task _after_ it has forked/execed. (This is a bit of a
kludge, but if we unmap before
Review comment:
Welll. I think this is not really good idea to remove the fork. Correct
me if I am wrong, but this will ALWAYS add overhead for starting the process
and launching a new interpreter, where `fork` was supposed to speed it up
immensely. And IMHO it has nothing to do with where DAGs are parsed.
While it might not matter for your case where you have huge DAGs to parse
and "fork" vs. "launch process" overhead is negligible, it will significantly
change the characteristics and overhead in case the DAGs are small (because the
whole interpreter will need to be launched in order to run the task. This
impacts both memory and time and I think we should not remove this optimization.
With fork, we use the fact that forking costs almost nothing - both in terms
of time and memory. Forked processes use copy-on-write semantics for memory
used, which means that most of the dynamic memory allocated before forking will
not be re-allocated again in the fork (unless the memory changes).
Unless I missed something - this is a huge change and we should not remove
fork option.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]