[
https://issues.apache.org/jira/browse/YUNIKORN-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871861#comment-17871861
]
Wilfred Spiegelenburg commented on YUNIKORN-2789:
-------------------------------------------------
The only place were we use the {{ComponentWiseMin()}} function is in the queue
call that we do not want anymore. Pushing through a refactor at the same time:
rename {{ComponentWiseMinPermissive()}} to become just {{ComponentWiseMin()}}
> Queue internalGetMax should use permissive calculator
> -----------------------------------------------------
>
> Key: YUNIKORN-2789
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2789
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - common
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Major
>
> We have documented for queue resources that:
> {quote}Resources that are not specified in the list are not limited, for max
> resources, or guaranteed in the case of guaranteed resources.
> {quote}
> However in the implementation on the queue, internalGetMax, we call
> resources.ComponentWiseMin(). This returns 0 values for each type that is not
> defined in the two resources passed in. That does not line up.
> Example for getting the maximum resources of a queue using GetMaxQueueSet
> what I would expect based on the documentation:
>
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
> => result child max{memory: 100G, vcore: 100}{code}
>
>
> currently we get:
> {code:java}
> parent: max{memory: 100G}
> parent.child: max{vcore: 100}
> => result child max{memory: 0, vcore: 0}{code}
> Similar when we add the root and call GetMaxResource:
> {code:java}
> root: max{memory: 100G, vcore: 200}
> root.parent: max{vcore: 100}
> root.parent.child: max{nvidia.com/gpu: 10}
> => result parent max{memory: 0, vcore: 100}
> => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code}
> The fact that the resource type does not exist, even in the root, should not
> mean a zero set. The nodes that expose the specific resource might not have
> been registered or scaled up yet.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]