paulochf commented on issue #41822:
URL: https://github.com/apache/airflow/issues/41822#issuecomment-3543533368

   Hello, everyone! We have noticed the same issue as reported here since 
June/25. I passed by only to add to the discussion, not that we could do 
something (won't be upgrading Airflow versions anytime soon).
   
   We use Airflow 2.5.1 on k8s with statsd enabled for Datadog, but just some 
are registered. the "Airflow did not have enough time to emit the metrics 
before something terminated" hypothesis is strong and makes sense; is it an 
actual issue?
   
   In our case, we have legacy Airflow DAGs that are roughly composed by boto3 
EMR interactions for 1. cluster creations and 2. steps additions. Our working 
metrics' list differs from [@Ibraitas' one up 
there](https://github.com/apache/airflow/issues/41822#issuecomment-2469921606). 
For the ones that arrive there (e.g., `ti.finish`), if the task fails fast 
enough, we don't get the measurement with state=failed, so our monitor end up 
not noticing the failure.
   
   As we intend to migrate those DAGs in the next few months to another 
completely different thing, using long running tasks and sensors attached, it 
will mitigate the fail fast issue. I can come here to tell whether it fixed our 
issue or not. Still, I'd like to leave the question if the "fail fast" problem 
is known and which Airflow version fixes it.
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to