[ https://issues.apache.org/jira/browse/AIRFLOW-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremiah Lowin updated AIRFLOW-862: ----------------------------------- External issue URL: https://github.com/apache/incubator-airflow/pull/2067 > Add DaskExecutor > ---------------- > > Key: AIRFLOW-862 > URL: https://issues.apache.org/jira/browse/AIRFLOW-862 > Project: Apache Airflow > Issue Type: New Feature > Components: executor > Reporter: Jeremiah Lowin > Assignee: Jeremiah Lowin > > The Dask Distributed sub-project makes it very easy to create pure-python > clusters of Dask workers ranging from a personal laptop to thousands of > networked cores. The workers can execute arbitrary functions submitted to the > Dask scheduler node. A full Dask app would involve multiple tasks with > data-dependencies (similar in philosophy to an Airflow DAG) but it will > happily run single functions as well. > The DaskExecutor is configured by supplying the IP address of the Dask > Scheduler. It submits Airflow commands to the cluster for execution (note: > the cluster should have access to any Airflow dependencies, including Airflow > itself!) and checks the resulting futures to see if the tasks completed > successfully. > Some advantages of using Dask for parallel execution over LocalExecutor or > CeleryExecutor are: > - simple scaling, from local machines to remote clusters > - pure python implementation (minimal dependencies and no need to run > additional databases) > - built in live-updating web UI for monitoring the cluster > > ** Note: This does NOT replace the Airflow scheduler or DAG engine with the > analogous Dask versions; it just uses the Dask cluster to run Airflow tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)