GitHub user EvertonSA created a discussion: Dag processor behavior at 
considered scale

Hi colleagues, perhaps these questions have been asked before, if yes, my 
apologies.

- Can I maintain two (or more) replicas of DAG processor?

- If yes, will both replicas process the same file twice? Do they have some 
sort of locking mechanism?  

Is there any advices on the below setup?

We have been running with 1 replica as we are unsure if two replicas can handle 
our setup. We also would like to give a faster feedback after dags are put on 
the filesystem. 

My setup is something like:
- Airflow 3.1.8
- 1 local filesystem bundle backed by Azure Files
- thousands of python files on this location
- some dags parse faster some dags parse slower, combination of static and 
dynamic dags, some with top level code. 
- config: file_parsing_sort_mode: random_seeded_by_host

Config: 

  dag_processor:
    refresh_interval: 600
    stale_dag_threshold: 300
    min_file_process_interval: 600
    dag_file_processor_timeout: 600
    parsing_processes: 16
    file_parsing_sort_mode: random_seeded_by_host
    
running on 8vcpu 16gib node

any advice is appreciated

GitHub link: https://github.com/apache/airflow/discussions/64944

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to