[ 
https://issues.apache.org/jira/browse/AIRFLOW-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992418#comment-16992418
 ] 

ASF subversion and git services commented on AIRFLOW-5931:
----------------------------------------------------------

Commit f69aa14a021a6160ecdb75678fd40cf00a404525 in airflow's branch 
refs/heads/master from Ash Berlin-Taylor
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=f69aa14 ]

[AIRFLOW-5931] Use os.fork when appropriate to speed up task execution. (#6627)

* [AIRFLOW-5931] Use os.fork when appropriate to speed up task execution.

  Rather than running a fresh python interpreter which then has to re-load
  all of Airflow and its dependencies we should use os.fork when it is
  available/suitable which should speed up task running, espeically for
  short lived tasks.

  I've profiled this and it took the task duration (as measured by the
  `duration` column in the TI table) from an average of 14.063s down to
  just 0.932s!

* Allow `reap_process_group` to kill processes even when the "group
leader" has already exited.

* Don't re-initialize JSON/stdout logging ElasticSearch inside forked processes

  Most of the time we will run the "raw" task in a forked subprocess (the
  only time we don't is when we use impersonation) that will have the
  logging already configured. So if the EsTaskHandler has already been
  configured we don't want to "re"configure it -- otherwise it will
  disable JSON output for the actual task!


> Spawning new python interpreter for every task slow
> ---------------------------------------------------
>
>                 Key: AIRFLOW-5931
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5931
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: executors, worker
>    Affects Versions: 2.0.0
>            Reporter: Ash Berlin-Taylor
>            Assignee: Ash Berlin-Taylor
>            Priority: Major
>
> There are a number of places in the Executors and Task Runners where we spawn 
> a whole new python interpreter.
> My profiling has shown that this is slow. Rather than running a fresh python 
> interpreter which then has to re-load all of Airflow and its dependencies we 
> should use {{os.fork}} when it is available/suitable which should speed up 
> task running, espeically for short lived tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to