Mzhussiyev commented on issue #52711: URL: https://github.com/apache/airflow/issues/52711#issuecomment-3051290446
Hey @karenbraganz, thanks for detailed answer. We've analyzed the situation for a couple of days. This happened several times again and ... Indeed, it's seems like something with DAG deactivation from serialized_dag table. From the scheduler container logs we spot that behavior and it happens very often (probably during heavy DAG runs) and for a lot of DAGs since our Airflow upgrade... <img width="1507" height="618" alt="Image" src="https://github.com/user-attachments/assets/70029a3f-8cd0-433f-a632-6567924bfa3e" /> Moreover, it's become difficult to open and see DAG tasks in UI, even after DAG run has finished. It's kinda stuck trying to load/serialize the DAG for a long time. <img width="865" height="837" alt="Image" src="https://github.com/user-attachments/assets/722a811f-fb40-4c4d-8b9b-e066be72d80f" /> I recon we have to play with scheduler/core params to stabilize the situation, increase DAG parse time etc. Would you recommend us proper parameters value!? Our current server configs in AWS: ec2 instance type: m6a.xlarge db instance type: db.t4g.small RAM consumes only up to 50% of it's full capacity at peak times CPU consumption is stable all the time, avg is 20% (though max almost reaches 100%) <img width="2960" height="937" alt="Image" src="https://github.com/user-attachments/assets/d64a2d57-85a5-42ae-9937-064d314c98ec" /> Current Airflow params: - 2 pools with 16 slots each. # Apache Airflow `сore` Configurations Section AIRFLOW__CORE__EXECUTOR = "LocalExecutor" AIRFLOW__CORE__LOAD_EXAMPLES = "False" # It cannot exceed pool slot size, which is 32 (sum of 2 pools with 16 each) AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG = "14" # parallelism is sum of [open, queued, running] task states AIRFLOW__CORE__PARALLELISM = "64" # This flag sets the minimum interval (in seconds) after which the serialized DAGs in the DB should be updated. # This helps in reducing database write rate AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL = "180" # This option controls how often the Serialized DAG will be re-fetched from the DB when it is already loaded in the DagBag in the Webserver. # Setting this higher will reduce load on the DB, but at the expense of displaying a possibly stale cached version of the DAG AIRFLOW__CORE__MIN_SERIALIZED_DAG_FETCH_INTERVAL = "120" # How long before timing out a python file import AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT = "60" # How long before timing out a DagFileProcessor, which processes a dag file AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT = "120" AIRFLOW__CORE__ENABLE_XCOM_PICKLING = "False" AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_RECYCLE = "300" # Apache Airflow `scheduler` Configurations Section AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC = "8" # The frequency that each DAG file is parsed, in seconds. Updates to DAGs are reflected after this interval AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL = "180" # Any new python dag files that you put in dags folder, it will take this much time to be processed by airflow and show up in UI. AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL = "180" AIRFLOW__SCHEDULER__PRINT_STATS_INTERVAL = "300" # Should the scheduler issue SELECT ... FOR UPDATE in relevant queries. # If this is set to False then you should not run more than a single scheduler at once. AIRFLOW__SCHEDULER__USE_ROW_LEVEL_LOCKING = "False" # The maximum number of DAGs to create DAG runs for per scheduler loop. # Decrease the value to free resources for scheduling tasks. The default value is 10. AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP = "2" And there is another question: We have 2 DAGs (D1 and D2). D1 scheduled at 00.00 and uses pool1, avg DAG run duration is 3h. D2 scheduled at 1.30 and uses pool2 BUT, D2 actually queued and starts only once D1 DAG finishes, about 3.00 even they are use separate pools. How does it happen so? And how to make D2 start at scheduled time 1.30? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
