GitHub user EvertonSA created a discussion: Dag processor behavior at
considered scale
Hi colleagues, perhaps these questions have been asked before, if yes, my
apologies.
- Can I maintain two (or more) replicas of DAG processor?
- If yes, will both replicas process the same file twice? Do they have some
sort of locking mechanism?
Is there any advices on the below setup?
We have been running with 1 replica as we are unsure if two replicas can handle
our setup. We also would like to give a faster feedback after dags are put on
the filesystem.
My setup is something like:
- Airflow 3.1.8
- 1 local filesystem bundle backed by Azure Files
- thousands of python files on this location
- some dags parse faster some dags parse slower, combination of static and
dynamic dags, some with top level code.
- config: file_parsing_sort_mode: random_seeded_by_host
Config:
dag_processor:
refresh_interval: 600
stale_dag_threshold: 300
min_file_process_interval: 600
dag_file_processor_timeout: 600
parsing_processes: 16
file_parsing_sort_mode: random_seeded_by_host
running on 8vcpu 16gib node
any advice is appreciated
GitHub link: https://github.com/apache/airflow/discussions/64944
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]