Squigilum opened a new issue #12995:
URL: https://github.com/apache/airflow/issues/12995


   **Apache Airflow version**: 2.0.0rc1
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): 1.19.4
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:  Laptop with 6 cores and 
32GB RAM
   - **OS** (e.g. from /etc/os-release): Ubuntu 20.04.1 LTS
   - **Kernel** (e.g. `uname -a`): 5.4.0-56-generic
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   I am running the 2.0.0 release candidate in minikube using the celery 
executor.  It was installed using the helm chart in git, with the executor 
changed and a persistent volume claim for storing dags added. I'm testing 
different scaling options by launching large amounts of tasks and evaluating 
how quickly/consistently they run. The DAG is run manually through the web 
server and on most runs, either some of the tasks will fail with no explanation 
or some tasks will be left in the 'queued' state and never run.  The tasks in 
the 'queued' state are shown as 'active' in the flower dashboard but do not 
appear to be actually running.
   
   As part of my testing I have increased the values of 
AIRFLOW__CORE__DAG_CONCURRENCY and AIRFLOW__CELERY__WORKER_CONCURRENCY.  This 
seems like it might exacerbate the problem but I have reproduced it with the 
default settings.
   
   **What you expected to happen**: All run successfully
   
   **What do you think went wrong?** Initially I thought I was over-taxing the 
system, but resource monitoring has shown nothing indicating this.  My system 
has 11Gb of RAM free and 4 CPUs, and CPU utilization never went over 30%.
   
   **How to reproduce it**:
   Attached is a simple DAG that produces the issue on my setup.
   
[concurrent_workflow.zip](https://github.com/apache/airflow/files/5675042/concurrent_workflow.zip)
   
   **Anything else we need to know**:
   I haven't seen anything indicating an error in the logs, but would be happy 
to provide if requested.
   
   **How often does this problem occur? Once? Every time etc?**  The majority 
of my runs (75-90%) have resulted in at between 1 and 4 tasks that are stuck in 
the 'queued' state. The failed tasks are less frequent (approximately 25%)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to