[
https://issues.apache.org/jira/browse/YUNIKORN-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405581#comment-17405581
]
Weiwei Yang commented on YUNIKORN-829:
--------------------------------------
hi [~yuchaoran2011] for the metrics collection, if we build that into the
k8shim, so there will be no concern about this is k8s only solution. Ideally,
we can define a metrics collector interface, one implementation can be metrics
server-based. Pls let us know if you have any further ideas about this. Thx
> Produce metrics on queue-level resource utilization
> ---------------------------------------------------
>
> Key: YUNIKORN-829
> URL: https://issues.apache.org/jira/browse/YUNIKORN-829
> Project: Apache YuniKorn
> Issue Type: New Feature
> Components: core - scheduler, shim - kubernetes
> Reporter: Chaoran Yu
> Priority: Major
>
> YuniKorn already has metrics on the resources requested/allocated for each
> queue. But we have no visibility into how much of the allocated resources are
> actually being used. Take Spark as an example, an under-optimized job may
> request 1 TB of total executor memory but the actual processing logic only
> uses 100 GB. This has the consequence that other jobs might not be able to
> fit in the queue. Having a metric that shows the real utilization will help
> members of a queue better understand their job characteristics and optimize
> the jobs.
> K8s metrics server has metrics on real utilization. YK may be able to perform
> some aggregations to arrive at the stats at the queue level. This is a
> k8s-specific solution though.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]