[ 
https://issues.apache.org/jira/browse/AIRFLOW-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik closed AIRFLOW-4924.
-------------------------------
    Resolution: Delivered

Delivered using AIP-24

> Loading DAGs asynchronously in Airflow webserver
> ------------------------------------------------
>
>                 Key: AIRFLOW-4924
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4924
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: webserver
>    Affects Versions: 1.10.4
>            Reporter: Zhou Fang
>            Assignee: Zhou Fang
>            Priority: Major
>              Labels: features, scalability, webserver
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h2. Scalability Issue in Webserver
> Airflow webserver uses gunicorn workers to serve HTTP requests. It loads all 
> DAGs from DAG files before serving requests. If there are many DAGs (e.g., > 
> 1,000), loading all DAGs can take a significant amount of time.
> Airflow webserver also relies on restarting gunicorn workers to refresh all 
> DAGs. This refreshing interval is set by webserver-worker_refresh_interval, 
> default to 30s. As a result, if loading all DAGs takes >30s, the webserver 
> will never be ready for HTTP requests.
> The current solution is to skip loading DAGs by using env var 
> SKIP_DAGS_PARSING. It makes the webserver work, but there is no DAG on the UI.
> h2. Asynchronously DAG Loading
> The solution here is to load DAGs asynchronously in the background. It 
> creates a background process to load DAGs, stringifies DAGs, and sends DAGs 
> to gunicorn worker process. The stringifying step is needed because some 
> fields can not be pickled, e.g., locally defined functions and user defined 
> modules. It aggressively transform all fields of DAG and task to be 
> string-compatible.
> This feature is enabled by webserver-async_dagbag_loader=True. The background 
> process sends DAGs to gunicorn worker gradually (every 
> webserver-dagbag_sync_interval). DAG refreshing interval is controlled by 
> webserver-collect_dags_interval.
> Asynchronous DAG loading has been released in Google Cloud Composer as an 
> Alpha feature:
>  [https://cloud.google.com/composer/docs/release-notes]
>  
> [https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface]
> This issue is created to merge the feature to Airflow upstream.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to