paulochf commented on issue #41822: URL: https://github.com/apache/airflow/issues/41822#issuecomment-3543533368
Hello, everyone! We have noticed the same issue as reported here since June/25. I passed by only to add to the discussion, not that we could do something (won't be upgrading Airflow versions anytime soon). We use Airflow 2.5.1 on k8s with statsd enabled for Datadog, but just some are registered. the "Airflow did not have enough time to emit the metrics before something terminated" hypothesis is strong and makes sense; is it an actual issue? In our case, we have legacy Airflow DAGs that are roughly composed by boto3 EMR interactions for 1. cluster creations and 2. steps additions. Our working metrics' list differs from [@Ibraitas' one up there](https://github.com/apache/airflow/issues/41822#issuecomment-2469921606). For the ones that arrive there (e.g., `ti.finish`), if the task fails fast enough, we don't get the measurement with state=failed, so our monitor end up not noticing the failure. As we intend to migrate those DAGs in the next few months to another completely different thing, using long running tasks and sensors attached, it will mitigate the fail fast issue. I can come here to tell whether it fixed our issue or not. Still, I'd like to leave the question if the "fail fast" problem is known and which Airflow version fixes it. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
