MatrixManAtYrService edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-849031715


   While trying to recreate this, I wrote a [stress 
test](https://github.com/MatrixManAtYrService/airflow-git-sync/blob/master/scheduler_stress.py)
 which I ran overnight on my local microk8s cluster 
(release:2.0.1+beb8af5ac6c438c29e2c186145115fb1334a3735 configured like 
[this](https://github.com/MatrixManAtYrService/airflow-git-sync/blob/master/zsh.stdin)).
   
   I was hoping that it would get fully stuck by the time I woke.  Instead 
there were only two stuck tasks: 
    
   
![stucktasks](https://user-images.githubusercontent.com/5834582/119712787-34db0f80-be1e-11eb-9a41-875dd88c0566.gif)
   
   Deleting the scheduler pod and letting kubernetes recreate it caused the two 
stuck tasks to complete.   At about 1:00 PM I cleared the state of all previous 
tasks.  For a little while, the scheduler managed to both backfill the cleared 
tasks and keep up with scheduled runs, but then something happened that caused 
most of the tasks to get stuck.
   
   <img width="825" alt="Screen Shot 2021-05-26 at 9 58 19 PM" 
src="https://user-images.githubusercontent.com/5834582/119764748-34b73000-be6f-11eb-99b0-c481905db56b.png";>
   
   Things were still limping along after that, but I never again saw more than 
three tasks running at once. This time, resetting the scheduler pod did **not** 
remedy the situation--it just resumed to its prior anemic state.  Here's a dump 
of the database and a snapshot of the scheduler logs right after a restart:
   
   
[db_and_scheduler_logs.tar.gz](https://github.com/apache/airflow/files/6551081/db_and_scheduler_logs.tar.gz)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to