Jeremiah Lowin created AIRFLOW-862:
--------------------------------------

             Summary: Add DaskExecutor
                 Key: AIRFLOW-862
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-862
             Project: Apache Airflow
          Issue Type: New Feature
          Components: executor
            Reporter: Jeremiah Lowin
            Assignee: Jeremiah Lowin


The Dask Distributed sub-project makes it very easy to create pure-python 
clusters of Dask workers ranging from a personal laptop to thousands of 
networked cores. The workers can execute arbitrary functions submitted to the 
Dask scheduler node. A full Dask app would involve multiple tasks with 
data-dependencies (similar in philosophy to an Airflow DAG) but it will happily 
run single functions as well.

The DaskExecutor is configured by supplying the IP address of the Dask 
Scheduler. It submits Airflow commands to the cluster for execution (note: the 
cluster should have access to any Airflow dependencies, including Airflow 
itself!) and checks the resulting futures to see if the tasks completed 
successfully.

Some advantages of using Dask for parallel execution over LocalExecutor or 
CeleryExecutor are:
  - simple scaling, from local machines to remote clusters
  - pure python implementation (minimal dependencies and no need to run 
additional databases)
  - built in live-updating web UI for monitoring the cluster
  
** Note: This does NOT replace the Airflow scheduler or DAG engine with the 
analogous Dask versions; it just uses the Dask cluster to run Airflow tasks.







--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to