SamWheating opened a new pull request #22531:
URL: https://github.com/apache/airflow/pull/22531


   When running `airflow db init`, Airflow will parse all of the DAGs in the 
configured dag folder sequentially in a single process. When there are a large 
number of DAGs present this can _significantly_ slow down the time it takes for 
the `db init` command to run. 
   
   In my opinion, initializing the DB and populating it with data are separate 
tasks and shouldn't be combined into a single function. The background DAG 
processor is also much faster at parsing files and populating the DB due to 
using multiprocessing. 
   
   I propose splitting the bootstrapping of the DagBag out into a separate 
function (so as to not introduce any changes to the test setup / teardown 
process) and removing it from the `db init` and `db reset` commands.
   
   Let me know if there's anything I'm missing here, or if there's an 
explanation for parsing DAGs here which I may have missed. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to