[ https://issues.apache.org/jira/browse/AIRFLOW-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kaxil Naik closed AIRFLOW-4924. ------------------------------- Resolution: Delivered Delivered using AIP-24 > Loading DAGs asynchronously in Airflow webserver > ------------------------------------------------ > > Key: AIRFLOW-4924 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4924 > Project: Apache Airflow > Issue Type: New Feature > Components: webserver > Affects Versions: 1.10.4 > Reporter: Zhou Fang > Assignee: Zhou Fang > Priority: Major > Labels: features, scalability, webserver > Original Estimate: 336h > Remaining Estimate: 336h > > h2. Scalability Issue in Webserver > Airflow webserver uses gunicorn workers to serve HTTP requests. It loads all > DAGs from DAG files before serving requests. If there are many DAGs (e.g., > > 1,000), loading all DAGs can take a significant amount of time. > Airflow webserver also relies on restarting gunicorn workers to refresh all > DAGs. This refreshing interval is set by webserver-worker_refresh_interval, > default to 30s. As a result, if loading all DAGs takes >30s, the webserver > will never be ready for HTTP requests. > The current solution is to skip loading DAGs by using env var > SKIP_DAGS_PARSING. It makes the webserver work, but there is no DAG on the UI. > h2. Asynchronously DAG Loading > The solution here is to load DAGs asynchronously in the background. It > creates a background process to load DAGs, stringifies DAGs, and sends DAGs > to gunicorn worker process. The stringifying step is needed because some > fields can not be pickled, e.g., locally defined functions and user defined > modules. It aggressively transform all fields of DAG and task to be > string-compatible. > This feature is enabled by webserver-async_dagbag_loader=True. The background > process sends DAGs to gunicorn worker gradually (every > webserver-dagbag_sync_interval). DAG refreshing interval is controlled by > webserver-collect_dags_interval. > Asynchronous DAG loading has been released in Google Cloud Composer as an > Alpha feature: > [https://cloud.google.com/composer/docs/release-notes] > > [https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface] > This issue is created to merge the feature to Airflow upstream. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)