SamWheating opened a new pull request #22531: URL: https://github.com/apache/airflow/pull/22531
When running `airflow db init`, Airflow will parse all of the DAGs in the configured dag folder sequentially in a single process. When there are a large number of DAGs present this can _significantly_ slow down the time it takes for the `db init` command to run. In my opinion, initializing the DB and populating it with data are separate tasks and shouldn't be combined into a single function. The background DAG processor is also much faster at parsing files and populating the DB due to using multiprocessing. I propose splitting the bootstrapping of the DagBag out into a separate function (so as to not introduce any changes to the test setup / teardown process) and removing it from the `db init` and `db reset` commands. Let me know if there's anything I'm missing here, or if there's an explanation for parsing DAGs here which I may have missed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
