NBardelot commented on issue #39717: URL: https://github.com/apache/airflow/issues/39717#issuecomment-2217235758
When I say "configuration specific", I mean it from the point-of-view of it being specific to the use of Kubernetes, or Redis OSS, or a config value. We use very straightforward configurations. If I had to take a bet, I'd wager on something specific to the DAG/tasks load (and maybe that enters your definition of "config specific"). Could you point to some statsd metrics that would help analyze the issue? (queue sizes, timeouts, something related to the scheduler load...) Also, we're currently migrating our observability stack to Datadog, which provides out-of-the-box Airflow dashboards that might be more useful for such analysis that the ones we made ourselves. We're new to the tool, and need some time to integrate it, but maybe you're familiar with their dashboards and know something that could be useful also there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
