NBardelot opened a new issue, #40027:
URL: https://github.com/apache/airflow/issues/40027

   ### Apache Airflow version
   
   2.9.1
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Some metrics using tags (`file_path`, `dag_id`, `task_id` essentially) are 
not corretly mapped in the Helm chart (see `chart/files/statsd-mappings.yml`). 
This is probably linked to a feature in Airflow v2.6 that allowed to avoid 
creating a new metric for each new DAG/task/file, and started to use tags 
instead, under common metrics.
   
   Yet I've stumbled upon `airflow_dag_processing_last_duration` having no 
label in my Prometheus, and found it was not mapped. I've added this as a 
workaround for the moment:
   
   ```
   statsd:
     enabled: true
     ...
     # workaround:
     extraMappings:
       - match: airflow.dag_processing.last_duration.*
         name: "airflow_dag_processing_last_duration"
         labels:
           dag_file: "$1"
   ```
   
   ### What you think should happen instead?
   
   Every metric being logged using tags should be mapped in 
`chart/files/statsd-mappings.yml` in order for labels to be applied by the 
statsd-exporter.
   
   As of Airflow 2.9.1 this is a list of calls to the Stats class that I think 
are using tags but missing a mapping: 
   
   | Metric name | Unmapped labels |
   | --- | --- |
   | `dag_processing.processes` | `dag_file: "$1"` |
   | `dag_processing.last_duration` | `dag_file: "$1"` |
   | `dag_processing.processor_timeouts` | `dag_file: "$1"` |
   | `sla_missed` | `dag_id: "$1"`, `task_id: "$2"` |
   | `sla_email_notification_failure` | `dag_id: "$1"`, `task_id: "$2"` |
   | `dag_file_refresh_error` | `dag_file: "$1"` |
   | `pool.queued_slots` | `pool: "$1"` |
   | `pool.running_slots` | `pool: "$1"` |
   | `pool.deferred_slots` | `pool: "$1"` |
   | `zombies_killed` | `dag_id: "$1"`, `task_id: "$2"` |
   | `dag.callback_exceptions` | `dag_id: "$1"` |
   | `task_restored_to_dag` | `dag_id: "$1"`, `task_id: "$2"` |
   | `task_removed_from_dag` | `dag_id: "$1"`, `task_id: "$2"` |
   | `task_instance_created` | `dag_id: "$1"`, `task_id: "$2"` |
   
   *Note: as this is a result of a quick `grep` this list might be incomplete 
and I might have misunderstood some of the metrics behaviour... The person who 
wants to provide a fix should not take it for absolute truth...*
   
   ### How to reproduce
   
   * Deploy Airflow in Kubernetes (for example Minikube) with statsd turned on
   * Add a DAG with a mock operator and run it
   * Wait for the statsd to be exported
   * Run `curl` on the statsd exported endpoint's `/metrics` in a nearby pod
   * Observe that `dag_processing_last_duration` and 
`dag_processing_last_duration_{DAG_id}` metrics both exist
   * Observe that `dag_processing_last_duration` lacks the `dag_file` label
   
   ### Operating System
   
   Kubernetes
   
   ### Versions of Apache Airflow Providers
   
   The 'statsd' requirements are installed using the official Apache 
constraints for Python 3.10 and Airflow 2.9.1.
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   No `.Values.statsd.overrideMappings` (see 
`chart/templates/configmaps/statsd-configmap.yaml`), we use the standard 
out-of-the-box mappings.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to