NBardelot opened a new issue, #40027:
URL: https://github.com/apache/airflow/issues/40027
### Apache Airflow version
2.9.1
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
Some metrics using tags (`file_path`, `dag_id`, `task_id` essentially) are
not corretly mapped in the Helm chart (see `chart/files/statsd-mappings.yml`).
This is probably linked to a feature in Airflow v2.6 that allowed to avoid
creating a new metric for each new DAG/task/file, and started to use tags
instead, under common metrics.
Yet I've stumbled upon `airflow_dag_processing_last_duration` having no
label in my Prometheus, and found it was not mapped. I've added this as a
workaround for the moment:
```
statsd:
enabled: true
...
# workaround:
extraMappings:
- match: airflow.dag_processing.last_duration.*
name: "airflow_dag_processing_last_duration"
labels:
dag_file: "$1"
```
### What you think should happen instead?
Every metric being logged using tags should be mapped in
`chart/files/statsd-mappings.yml` in order for labels to be applied by the
statsd-exporter.
As of Airflow 2.9.1 this is a list of calls to the Stats class that I think
are using tags but missing a mapping:
| Metric name | Unmapped labels |
| --- | --- |
| `dag_processing.processes` | `dag_file: "$1"` |
| `dag_processing.last_duration` | `dag_file: "$1"` |
| `dag_processing.processor_timeouts` | `dag_file: "$1"` |
| `sla_missed` | `dag_id: "$1"`, `task_id: "$2"` |
| `sla_email_notification_failure` | `dag_id: "$1"`, `task_id: "$2"` |
| `dag_file_refresh_error` | `dag_file: "$1"` |
| `pool.queued_slots` | `pool: "$1"` |
| `pool.running_slots` | `pool: "$1"` |
| `pool.deferred_slots` | `pool: "$1"` |
| `zombies_killed` | `dag_id: "$1"`, `task_id: "$2"` |
| `dag.callback_exceptions` | `dag_id: "$1"` |
| `task_restored_to_dag` | `dag_id: "$1"`, `task_id: "$2"` |
| `task_removed_from_dag` | `dag_id: "$1"`, `task_id: "$2"` |
| `task_instance_created` | `dag_id: "$1"`, `task_id: "$2"` |
*Note: as this is a result of a quick `grep` this list might be incomplete
and I might have misunderstood some of the metrics behaviour... The person who
wants to provide a fix should not take it for absolute truth...*
### How to reproduce
* Deploy Airflow in Kubernetes (for example Minikube) with statsd turned on
* Add a DAG with a mock operator and run it
* Wait for the statsd to be exported
* Run `curl` on the statsd exported endpoint's `/metrics` in a nearby pod
* Observe that `dag_processing_last_duration` and
`dag_processing_last_duration_{DAG_id}` metrics both exist
* Observe that `dag_processing_last_duration` lacks the `dag_file` label
### Operating System
Kubernetes
### Versions of Apache Airflow Providers
The 'statsd' requirements are installed using the official Apache
constraints for Python 3.10 and Airflow 2.9.1.
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
No `.Values.statsd.overrideMappings` (see
`chart/templates/configmaps/statsd-configmap.yaml`), we use the standard
out-of-the-box mappings.
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]