eanikindfi opened a new issue, #43432: URL: https://github.com/apache/airflow/issues/43432
### Apache Airflow version 2.10.2 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? We have Airflow `(2.10.2)` in k8s cluster deployed by an official helm-chart. This helm-release contains `statsd` component, and Airflow send its metrics to `statsd`. We use `celery-executor`, so our tasks are running inside worker-pods. Also we have Victoriametrics release in this cluster. We scrape metrics from `statsd` with `VMScrapeConfig`. Out of all the metrics provided by Airflow we need [2 essential ones](https://github.com/apache/airflow/blob/v2-10-stable/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst#gauges): - task.cpu_usage_percent.<dag_id>.<task_id> - task.mem_usage_percent.<dag_id>.<task_id> We can see lifecycles of out tasks/dags in airflow-webinterface, so we know when each task started and ended. But if we get metric values from statsd at the moment when our task already ended (hours after the completion), we still get the `cpu_usage` and `mem_usage` for this task. ### What you think should happen instead? According to documentation `task.cpu_usage_percent.<dag_id>.<task_id>` and `task.mem_usage_percent.<dag_id>.<task_id>` are gauges that show: > Percentage of CPU/memory used by a task So we assume, that id task ended at 02:15 PM, then at 02:16 PM or even later Airflow shouldn't send either `task.cpu_usage_percent.*` or `task.mem_usage_percent.*` for this task to `statsd`, right? The essential meaning of these 2 metrics is to show how much resources in use for each task/dag at the moment, That way we can visualize dynamic of the resource-usage of Airflow or create alerting-solutions. Correct me if I'm wrong. ### How to reproduce **Configuration** statsd: ```yaml statsd: extraMappings: - match: airflow.task.cpu_usage.*.* name: "airflow_task_cpu_usage" help: "Percentage of CPU used by a task" labels: dag_id: "$1" task_id: "$2" - match: airflow.task.mem_usage.*.* name: "airflow_task_mem_usage" help: "Percentage of memory used by a task" labels: dag_id: "$1" task_id: "$2" ``` VMScrapeConfig: ```yaml apiVersion: operator.victoriametrics.com/v1beta1 kind: VMScrapeConfig metadata: name: airflow-service-scrape namespace: monitoring spec: staticConfigs: - targets: [airflow-statsd.airflow.svc.cluster.local:9102] metricsPath: /metrics scrapeInterval: 30s scrapeTimeout: 15s ``` ### Operating System Kubernetes v1.31.0-eks ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details Even though I have provided `VMScrapeConfig` configuration, there is no need to use it in test-cases for reproduction of the issue. Because we check `statsd` endpoint (on port 9102) also to see the raw metrics delivered from Airflow - it still have the same issue. ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
