deepchand commented on issue #19192: URL: https://github.com/apache/airflow/issues/19192#issuecomment-1217867629
I have also facing the same problem, below are the details airflow - 2.2.2 providers apache-airflow[celery,redis,microsoft.azure,slack] - 2.2.2 constraints - https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.6.txt postgres version - 11 executor - celery executor we have total 1 - webserver, 2 - scheduler and 6 - workers in cluster When the problem occurs - When we ran around more then 100 parallel dag run it starts to show this warning The scheduler does not appear to be running. Last heartbeat was received X minutes ago and the number of minutes get increasing till 20-22 minutes if we have scheduled more dag run in parallel and scheduler does not schedule new tasks while we have enough resources available on all components. These are some config parameters which we are using job_heartbeat_sec = 10 scheduler_heartbeat_sec = 5 scheduler_idle_sleep_time = 1 min_file_process_interval = 300 dag_dir_list_interval = 300 scheduler_health_check_threshold = 300 scheduler_zombie_task_threshold = 600 max_tis_per_query = 512 use_row_level_locking = True max_dagruns_to_create_per_loop = 50 parsing_processes = 8 While we search in schedulers log we have found scheduling loop time gets increase <img width="1361" alt="Screenshot 2022-08-17 at 4 36 01 PM" src="https://user-images.githubusercontent.com/18530606/185104188-f3911f9f-f1f8-42e2-ba97-eba029b9b094.png"> <img width="1335" alt="Screenshot 2022-08-17 at 4 36 18 PM" src="https://user-images.githubusercontent.com/18530606/185104211-f498996a-f9a1-4268-8537-4d10b8cef630.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
