bhupixb opened a new issue #18630: URL: https://github.com/apache/airflow/issues/18630
### Apache Airflow version 2.1.3 ### Operating System Debian GNU/Linux 10 (buster) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==2.0.0 apache-airflow-providers-celery==2.0.0 apache-airflow-providers-cncf-kubernetes==2.0.2 apache-airflow-providers-docker==2.0.0 apache-airflow-providers-elasticsearch==2.0.2 apache-airflow-providers-ftp==2.0.0 apache-airflow-providers-google==5.0.0 apache-airflow-providers-grpc==2.0.0 apache-airflow-providers-hashicorp==2.0.0 apache-airflow-providers-http==2.0.0 apache-airflow-providers-imap==2.0.0 apache-airflow-providers-microsoft-azure==2.0.0 apache-airflow-providers-mysql==2.1.0 apache-airflow-providers-postgres==2.0.0 apache-airflow-providers-redis==2.0.0 apache-airflow-providers-sendgrid==2.0.0 apache-airflow-providers-sftp==2.1.0 apache-airflow-providers-slack==4.0.0 apache-airflow-providers-sqlite==2.0.0 apache-airflow-providers-ssh==2.1.0 ### Deployment Other ### Deployment details We have modified official airflow helm chart to meet our needs. Kubernetes version: 1.15.12 ### What happened We have followed official [documentation](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html) for setting up metrics in airflow using statsd. Then we are using Prometheus to pull these metrics from statsd. Here is our configmap for statsd mapping.yaml https://ideone.com/cotYSG. The issue that we are facing is that in statsd we are not getting these metrics `dagrun.duration.success.<dag_id>` and `dagrun.duration.failed.<dag_id>` in statsd. Most other metrics are coming fine. Our statsd configuration: metrics: statsd_on: 'True' statsd_port: 9125 statsd_prefix: airflow statsd_host: airflow-statsd ### What you expected to happen Metrics `dagrun.duration.success.<dag_id>` and `dagrun.duration.failed.<dag_id>` should also come to statsd. These metrics are required to setup some alerts in prometheus e.g. for long running dags. ### How to reproduce We are using our custom written helm chart, so not sure how others can reproduce. But we are running this inside kubernetes cluster and statsd, scheduler & webserver are running inside their individual pods. ### Anything else This issue is happening for 99% of the time, a few time we see the above 2 metrics in prometheus, but unable to find the correlation why it came on that specific time and for that dag. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
