grayver opened a new issue #11331: URL: https://github.com/apache/airflow/issues/11331
**Apache Airflow version**: 1.10.12 **Environment**: - **Cloud provider or hardware configuration**: Amazon EC2 instance, 4 CPU cores, 8GB RAM - **OS** (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS (Bionic Beaver) - **Kernel** (e.g. `uname -a`): Linux ip-XX-XX-XX-XX.ec2.internal 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux - **Install tools**: Ansible Airflow role (https://github.com/idealista/airflow-role) - **Some Airflow configuration parameters**: ``` executor = LocalExecutor sql_alchemy_conn = postgresql+psycopg2://user:pwd@aws-rds-server:5432/airflow_dev2 sql_alchemy_pool_size = 20 parallelism = 32 dag_concurrency = 8 task_concurrency = 4 non_pooled_task_slot_count = 128 task_runner = StandardTaskRunner max_threads = 4 ``` **What happened**: We have 13 DAGs in our Airflow. Some of them in some circumstances process a large amount of data. Usually it's a parsing some large file, transform parsed data and load it into database. Also there are database processing tasks which involve long-running queries. So, some tasks could be running for several hours sometimes. The problem is those long-running tasks block all other tasks from being started. Tasks which are scheduled to run hourly are not started until long-running task is completed. Also we see an yellow bar in our Airflow Web UI: ``` The scheduler does not appear to be running. Last heartbeat was received XX minutes ago. The DAGs list may not update, and new tasks will not be scheduled. ``` We examined Airflow scheduler logs and figured out that scheduler just doesn't try to grab new tasks while long-running task is running. When there is no long-running task running we see that scheduler tries to check whether any task could run and check parallelism/concurrency limitation for them. But with long-running task there are no log messages like this. Manual triggering also doesn't help - triggered tasks are not started until long-running task is finished. **What you expected to happen**: We expect all other DAGs to start according to their schedule when long-running task is running. This how LocalExecutor should work according to documentation. We also checked server resources for those cases - but there are a lot of free RAM and CPU in that time, so it shoudn't be the cause. **Anything else we need to know**: This problem occurs every time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
