boushphong opened a new issue, #35231: URL: https://github.com/apache/airflow/issues/35231
### Description Basically, we'd want the dag-processor component to runs all its operations within the `_run_parsing_loop` method only once and then the process would gracefully terminate after it saves the parsing results to the metadata db. ### Use case/motivation We have an Airflow deployment that parses a large number of DAGs (some are dynamic DAGs) and the parsing often would take a very long time. So it would be nice to make the dag-processor parse all the DAG files only once and then it would terminate itself instead of running it continuously to save resources. The use case would be: - Users push their code to the DAG repo. - The dag processor would run in the CI/CD process and saves the DAG parsing results to the metadata database. This can happen incrementally. Currently, we run the airflow scheduler and the airflow dag-processor separately. However, I've noticed that the scheduler also deactivate stale dags when `standalone_dag_processor=True` so we could not implement the dag-processor to run in the CI process yet. There is also a workaround to set `dag_stale_not_seen_duration` to a very big number so that the scheduler would never deactivate stale dags. Modifications would be: - Make the scheduler to have an option to not deactivate stale dags. - Enable the dag-processor to do all its operations in `_run_parsing_loop` only once and then it would gracefully terminate itself. ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
