[
https://issues.apache.org/jira/browse/AIRFLOW-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhou Fang updated AIRFLOW-4924:
-------------------------------
Affects Version/s: (was: 1.10.3)
(was: 1.10.2)
Fix Version/s: (was: 1.10.3)
(was: 1.10.2)
> Loading DAGs asynchronously in Airflow webserver
> ------------------------------------------------
>
> Key: AIRFLOW-4924
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4924
> Project: Apache Airflow
> Issue Type: New Feature
> Components: webserver
> Affects Versions: 1.10.4
> Reporter: Zhou Fang
> Assignee: Zhou Fang
> Priority: Major
> Labels: features, scalability, webserver
> Fix For: 1.10.4
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> h2. Scalability Issue in Webserver
> Airflow webserver uses gunicorn workers to serve HTTP requests. It loads all
> DAGs from DAG files before serving requests. If there are many DAGs (e.g., >
> 1,000), loading all DAGs can take a significant amount of time.
> Airflow webserver also relies on restarting gunicorn workers to refresh all
> DAGs. This refreshing interval is set by webserver-worker_refresh_interval,
> default to 30s. As a result, if loading all DAGs takes >30s, the webserver
> will never be ready for HTTP requests.
> The current solution is to skip loading DAGs by using env var
> SKIP_DAGS_PARSING. It makes the webserver work, but there is no DAG on the UI.
> h2. Asynchronously DAG Loading
> The solution here is to load DAGs asynchronously in the background. It
> creates a background process to load DAGs, stringifies DAGs, and sends DAGs
> to gunicorn worker process. The stringifying step is needed because some
> fields can not be pickled, e.g., locally defined functions and user defined
> modules. It aggressively transform all fields of DAG and task to be
> string-compatible.
> This feature is enabled by webserver-async_dagbag_loader=True. The background
> process sends DAGs to gunicorn worker gradually (every
> webserver-dagbag_sync_interval). DAG refreshing interval is controlled by
> webserver-collect_dags_interval.
> Asynchronous DAG loading has been released in Google Cloud Composer as an
> Alpha feature:
> [https://cloud.google.com/composer/docs/release-notes]
>
> [https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface]
> This issue is created to merge the feature to Airflow upstream.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)