grayver opened a new issue #11331:
URL: https://github.com/apache/airflow/issues/11331


   **Apache Airflow version**: 1.10.12
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: Amazon EC2 instance, 4 CPU 
cores, 8GB RAM
   - **OS** (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS (Bionic Beaver)
   - **Kernel** (e.g. `uname -a`): Linux ip-XX-XX-XX-XX.ec2.internal 
5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 
x86_64 x86_64 GNU/Linux
   - **Install tools**: Ansible Airflow role 
(https://github.com/idealista/airflow-role)
   - **Some Airflow configuration parameters**:
   ```
   executor = LocalExecutor
   sql_alchemy_conn = 
postgresql+psycopg2://user:pwd@aws-rds-server:5432/airflow_dev2
   sql_alchemy_pool_size = 20
   parallelism = 32
   dag_concurrency = 8
   task_concurrency = 4
   non_pooled_task_slot_count = 128
   task_runner = StandardTaskRunner
   max_threads = 4
   ```
   
   **What happened**:
   
   We have 13 DAGs in our Airflow. Some of them in some circumstances process a 
large amount of data. Usually it's a parsing some large file, transform parsed 
data and load it into database. Also there are database processing tasks which 
involve long-running queries. So, some tasks could be running for several hours 
sometimes. The problem is those long-running tasks block all other tasks from 
being started. Tasks which are scheduled to run hourly are not started until 
long-running task is completed. Also we see an yellow bar in our Airflow Web UI:
   ```
   The scheduler does not appear to be running. Last heartbeat was received XX 
minutes ago.
   The DAGs list may not update, and new tasks will not be scheduled.
   ```
   
   We examined Airflow scheduler logs and figured out that scheduler just 
doesn't try to grab new tasks while long-running task is running. When there is 
no long-running task running we see that scheduler tries to check whether any 
task could run and check parallelism/concurrency limitation for them. But with 
long-running task there are no log messages like this.
   
   Manual triggering also doesn't help - triggered tasks are not started until 
long-running task is finished.
   
   **What you expected to happen**:
   
   We expect all other DAGs to start according to their schedule when 
long-running task is running. This how LocalExecutor should work according to 
documentation.
   
   We also checked server resources for those cases - but there are a lot of 
free RAM and CPU in that time, so it shoudn't be the cause.
   
   **Anything else we need to know**:
   
   This problem occurs every time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to