[ 
https://issues.apache.org/jira/browse/YUNIKORN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821010#comment-17821010
 ] 

Peter Bacsko edited comment on YUNIKORN-2423 at 2/29/24 3:36 PM:
-----------------------------------------------------------------

I think the main problem from our perspective is that we cannot reject the 
resize operation (maybe this will change in the future if there's a need for 
it). If it is accepted by the kubelet, we have to process it no matter what.

There's a potential race condition which is preventable with checks in 
{{IncreaseTrackedResource()}}:
1. {{Headroom()}} says there's enough quota
2. Successful resize request comes in, flows through the core
3. We call {{IncreaseTrackedResource()}} and go over the quota

So this can be prevented but what if it arrives just after we left 
{{IncreaseTrackedResource()}}? There's not much we can do about that, the 
allocation is already processed at that point.

There's one thing I was thinking about: we can handle resizes in a special way 
with preemption. So if a resource usage was enlarged, we mark it on an 
allocation, essentially reducing its priority, so such an allocation is more 
likely to become a preemption victim. We can even have a separate preemption 
mode which only deals with resizes (not sure if this is really useful, just a 
random thought).


was (Author: pbacsko):
I think the main problem from our perspective is that we cannot deny the resize 
operation (maybe this will change in the future if there's a need for it). If 
it is accepted by the kubelet, we have to process it no matter what.

There's a potential race condition which is preventable with checks in 
{{IncreaseTrackedResource()}}:
1. {{Headroom()}} says there's enough quota
2. Successful resize request comes in, flows through the core
3. We call {{IncreaseTrackedResource()}} and go over the quota

So this can be prevented but what if it arrives just after we left 
{{IncreaseTrackedResource()}}? There's not much we can do about that, the 
allocation is already processed at that point.

There's one thing I was thinking about: we can handle resizes in a special way 
with preemption. So if a resource usage was enlarged, we mark it on an 
allocation, essentially reducing its priority, so such an allocation is more 
likely to become a preemption victim. We can even have a separate preemption 
mode which only deals with resizes (not sure if this is really useful, just a 
random thought).

> Remove unnecessary boolean return value from the tracking code
> --------------------------------------------------------------
>
>                 Key: YUNIKORN-2423
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2423
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Minor
>              Labels: pull-request-available
>
> QueueTracker has two methods which both have an unnecessary return value:
>  
> {noformat}
> increaseTrackedResource() bool
> decreaseTrackedResource() (bool, bool)
> {noformat}
> The value from {{increaseTrackedResource()}} is always true. It used to be 
> different, but it no longer has any relevance.
> Same goes for {{{}decreaseTrackedResource(){}}}, only the first boolean can 
> change which indicates whether a tracker can be removed.
> Also, {{UserTracker.increaseTrackedResource()}} can be simplified as the 
> increment always succeeds and does not need to return anything:
> {noformat}
> func (ut *UserTracker) increaseTrackedResource(queuePath string, 
> applicationID string, usage *resources.Resource) bool {
>       ut.Lock()
>       defer ut.Unlock()
>       hierarchy := strings.Split(queuePath, configs.DOT)
>       ut.events.sendIncResourceUsageForUser(ut.userName, queuePath, usage)
>       increased := ut.queueTracker.increaseTrackedResource(hierarchy, 
> applicationID, user, usage)
>       if increased {
>               ... // branch always taken
>       }
>       return increased
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to