HsiuChuanHsu commented on code in PR #56690: URL: https://github.com/apache/airflow/pull/56690#discussion_r2450343945
########## airflow-core/docs/administration-and-deployment/logging-monitoring/metrics.rst: ########## @@ -254,6 +254,8 @@ Name Description ``pool.scheduled_slots`` Number of scheduled slots in the pool. Metric with pool_name tagging. ``pool.starving_tasks.<pool_name>`` Number of starving tasks in the pool ``pool.starving_tasks`` Number of starving tasks in the pool. Metric with pool_name tagging. +``task.cpu_usage_percent.<dag_id>.<task_id>`` CPU usage percentage of a task Review Comment: Thanks for all your feedback! IMO, when we got large number of DAGs and Tasks, we need to define the most efficient granularity for monitoring data. For me, it would be more efficient just to focus on the Task-Level as the finest-grained unit for core monitoring. The combination of `<dag_id>.<task_id>` provides the most practical and efficient level for routine data monitoring. > Possibly reporting the stats on individual instances as gauge will produce a high cardinality statistics. Possibly the cardinality there is not "too high" if we do it per individual tis. But I am not sure. If we trying to deep down to `try_id`, `map_index` level, that will likely result in a very high-cardinality metric set. My perspective is that this level of detail would be too high and potentially inefficient for large-scale monitoring. But not sure of others. > When we are using gauge, only the last one counts, and previous values are replaced by the following ones - so effectively what we have is the valus in last execution of the "primary key". Not sure what is the best approach here. I think recording the value from the last execution of the same primary key (`<dag_id>.<task_id>`) should be sufficient. When using time-series monitoring tools (e.g., Prometheus) that automatically collect records with a timestamp, there is no effort to trace back past data based on the primary key. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
