[
https://issues.apache.org/jira/browse/YUNIKORN-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888422#comment-17888422
]
Craig Condit commented on YUNIKORN-2908:
----------------------------------------
YUNIKORN-2855 had an incomplete fix. As we've looked at it further, it is
subtly broken – it doesn't take into account the {{state}} parameter when
calculating the already-seen resources. We should probably rebuild that
functionality to use the built-in {{Describe()}} method to iterate all the
existing values and remove those where the state matches but we don't have a
new value. This is not a simple change.
> metrics not removed when queue or queue's guaranteed/max resource config is
> removed
> -----------------------------------------------------------------------------------
>
> Key: YUNIKORN-2908
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2908
> Project: Apache YuniKorn
> Issue Type: Bug
> Reporter: Hengzhe Guo
> Assignee: Hengzhe Guo
> Priority: Major
>
> 1. after a queue is removed, its metrics will continue to be reported by
> prometheus. This is fine with metrics like allocated resource because they
> will just be 0, but it won't make sense for guaranteed and max resources,
> giving wrong impression that there are still resource given to the queue. I
> propose to unregister all this queue's metrics when it's removed.
> 2. If queue is not removed but guaranteed or max resource config is removed,
> or just a resource type is removed from the config, the metrics are also not
> cleaned up. these metrics are only updated when there's a new valid value,
> but not 'null' value. I propose to always delete all existing guaranteed and
> max resources metrics of the queue then add back the new values, every time
> we apply the configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]