QuintenBruynseraede commented on issue #33647: URL: https://github.com/apache/airflow/issues/33647#issuecomment-1830753096
@shubhransh-eb I believe we're seeing the same issue on Airflow 2.7.2, though we are using an RDS Postgres instance. I'm looking for a starting point to do some more investigation. Some facts: - Our triggerer stops heartbeating after roughly 20 seconds (give or take two heartbeat periods), presumably after taking up a decent amount of triggers. After `cfg.triggerer_health_check_threshold`, the liveness probes start failing. - During normal load we are running about 150 triggers as well - Performing a postgres `ANALYZE task_instance;` (or any other table involved in your query) did not resolve the issue for us - I can only reproduce this issue on our Production environment which has about 7 million task instances in the metadata DB. On an empty environment a single triggerer handles up to 1000 triggers without any issue How did you find out where the triggerer loop is spending a lot of time in your case? I'm not able to get anything from the debug logs except the fact that no more heartbeats happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
