tardunge commented on issue #51213: URL: https://github.com/apache/airflow/issues/51213#issuecomment-2942741734
@jroachgolf84 Didn't find a fix but I think i'm close to the root cause. The co-routine might not be progressing due to a potential deadlock. I see the issue happening more often and consistently after some message from sqs has been consumed. When a message get's consumed, the triggerer yields a TriggerEvent and then breaks out of the run method at [L184](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/triggers/sqs.py#L184) After this, the triggerer gets into a completed state and the main job responsible for running the triggerer coroutines marks it for removal and adds a new instance. The new instance gets spawned and this is where the stalling happens at [L187](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/triggers/sqs.py#L187) The get connection method, eventually talks to the supervisor at [triggerer_job_runner.py L394](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L394). Meanwhile, we have this busy loop responsible for spawning and maintaining the lifecycle of triggerers at [L749](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L749) and this loops uses a method called `sync_state_to_supervisor` at [L936](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L936). I highly suspect the `GetConnection` and the main event loop are contending for this `LOCK` at [L965](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L965) and resulting in a deadlock at some time. It would be great if there is a determinisitc simulated test case for things like these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
