[I] Invalid metric values for completed tasks task.cpu_usage_percent/task.mem_usage_percent [airflow]

via GitHub Mon, 28 Oct 2024 02:20:29 -0700


eanikindfi opened a new issue, #43432:
URL: https://github.com/apache/airflow/issues/43432


   ### Apache Airflow version
   
   2.10.2
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We have Airflow `(2.10.2)` in k8s cluster deployed by an official 
helm-chart. This helm-release contains `statsd` component, and Airflow send its 
metrics to `statsd`.
   We use `celery-executor`, so our tasks are running inside worker-pods.
   Also we have Victoriametrics release in this cluster. We scrape metrics from 
`statsd` with `VMScrapeConfig`.
   
   Out of all the metrics provided by Airflow we need [2 essential 
ones](https://github.com/apache/airflow/blob/v2-10-stable/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst#gauges):
   
   - task.cpu_usage_percent.<dag_id>.<task_id>
   - task.mem_usage_percent.<dag_id>.<task_id>
   
   We can see lifecycles of out tasks/dags in airflow-webinterface, so we know 
when each task started and ended.
   But if we get metric values from statsd at the moment when our task already 
ended (hours after the completion), we still get the `cpu_usage` and 
`mem_usage` for this task.
   
   ### What you think should happen instead?
   
   According to documentation `task.cpu_usage_percent.<dag_id>.<task_id>` and 
`task.mem_usage_percent.<dag_id>.<task_id>` are gauges that show:
   
   > Percentage of CPU/memory used by a task
   
   So we assume, that id task ended at 02:15 PM, then at 02:16 PM or even later 
Airflow shouldn't send either `task.cpu_usage_percent.*` or 
`task.mem_usage_percent.*` for this task to `statsd`, right?
   
   The essential meaning of these 2 metrics is to show how much resources in 
use for each task/dag at the moment, That way we can visualize dynamic of the 
resource-usage of Airflow or create alerting-solutions. Correct me if I'm wrong.
   
   ### How to reproduce
   
   **Configuration**
   
   statsd:
   ```yaml
   statsd:
     extraMappings:
     - match: airflow.task.cpu_usage.*.*
       name: "airflow_task_cpu_usage"
       help: "Percentage of CPU used by a task"
       labels:
         dag_id: "$1"
         task_id: "$2"
     - match: airflow.task.mem_usage.*.*
       name: "airflow_task_mem_usage"
       help: "Percentage of memory used by a task"
       labels:
         dag_id: "$1"
         task_id: "$2"
   ```
   
   VMScrapeConfig:
   ```yaml
   apiVersion: operator.victoriametrics.com/v1beta1
   kind: VMScrapeConfig
   metadata:
     name: airflow-service-scrape
     namespace: monitoring
   spec:
     staticConfigs:
       - targets: [airflow-statsd.airflow.svc.cluster.local:9102]
     metricsPath: /metrics
     scrapeInterval: 30s
     scrapeTimeout: 15s
   ```
   
   
   ### Operating System
   
   Kubernetes  v1.31.0-eks
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Even though I have provided `VMScrapeConfig` configuration, there is no need 
to use it in test-cases for reproduction of the issue. Because we check 
`statsd` endpoint (on port 9102) also to see the raw metrics delivered from 
Airflow - it still have the same issue.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Invalid metric values for completed tasks task.cpu_usage_percent/task.mem_usage_percent [airflow]

Reply via email to