[ 
https://issues.apache.org/jira/browse/YUNIKORN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Santa Clara updated YUNIKORN-2950:
---------------------------------------
    Description: 
We observed some of our larger clusters(with autoscaling via Karpenter) begin 
to mis-report their queue usages after upgrading to 1.5.2.

As an example, given a leaf queue root.tiers.2, we observed the following state:
{code:java}
allocated capacity(root.tiers.2) : pods 200 memory 2.187347412109375 Tib vcore 
0.2 k ephemeral-storage 4.294967295999999TB{code}
but when we summed the allocations in full-state dump, we found:
{code:java}
root.tiers.2 : pods 0 memory 0.0 Tib vcore 0.0 k ephemeral-storage 0.0 TB{code}
Similarly, we examined the number of running pods in K8s, and we found 0.  The 
queue allocations were clearly off.  

This was fixed by applying the following patch to remove the race condition:
{code:java}
 func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
bool) error {
        // check this queue: failure stops checks if the allocation is not part 
of a node addition
-       fit, newAllocated := sq.allocatedResFits(alloc)
+       fit, _ := sq.allocatedResFits(alloc)
        if !nodeReported && !fit {
                return fmt.Errorf("allocation (%v) puts queue '%s' over maximum 
allocation (%v), current usage (%v)",
                        alloc, sq.QueuePath, sq.maxResource, 
sq.allocatedResource)
@@ -1058,6 +1058,7 @@ func (sq *Queue) IncAllocatedResource(alloc 
*resources.Resource, nodeReported bo
        sq.Lock()
        defer sq.Unlock()
        // all OK update this queue
+       newAllocated := resources.Add(sq.allocatedResource, alloc)
        sq.allocatedResource = newAllocated
        sq.updateAllocatedResourceMetrics()
        return nil {code}
  
This appears to be [fixed|#L1041] in 1.6.0,  although I have not confirmed it.

It appears to me that the race condition was introduced 
[here|https://github.com/apache/yunikorn-core/pull/839/files#diff-27632d48eb925e150a33bc92370ceaa66c31048018d11ca7a53a0b50ab7250acL1033].

  was:
We observed some of our larger clusters(with autoscaling via Karpenter) begin 
to mis-report their queue usages after upgrading to 1.5.2.

As an example, given a leaf queue root.tiers.2, we observed the following state:
{code:java}
allocated capacity(root.tiers.2) : pods 200 memory 2.187347412109375 Tib vcore 
0.2 k ephemeral-storage 4.294967295999999TB{code}
but when we summed the allocations in full-state dump, we found:
{code:java}
root.tiers.2 : pods 0 memory 0.0 Tib vcore 0.0 k ephemeral-storage 0.0 TB{code}
Similarly, we examined the number of running pods in K8s, and we found 0.  The 
queue allocations were clearly off.  

This was fixed by applying the following patch to remove the race condition:
{code:java}
 func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
bool) error {
        // check this queue: failure stops checks if the allocation is not part 
of a node addition
-       fit, newAllocated := sq.allocatedResFits(alloc)
+       fit, _ := sq.allocatedResFits(alloc)
        if !nodeReported && !fit {
                return fmt.Errorf("allocation (%v) puts queue '%s' over maximum 
allocation (%v), current usage (%v)",
                        alloc, sq.QueuePath, sq.maxResource, 
sq.allocatedResource)
@@ -1058,6 +1058,7 @@ func (sq *Queue) IncAllocatedResource(alloc 
*resources.Resource, nodeReported bo
        sq.Lock()
        defer sq.Unlock()
        // all OK update this queue
+       newAllocated := resources.Add(sq.allocatedResource, alloc)
        sq.allocatedResource = newAllocated
        sq.updateAllocatedResourceMetrics()
        return nil {code}
  
This appears to be [fixed|#L1041]] in 1.6.0,  although I have not confirmed it.

 


> Race condition in 1.5.2 causes queue usage to be incorrectly calcuated
> ----------------------------------------------------------------------
>
>                 Key: YUNIKORN-2950
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2950
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Paul Santa Clara
>            Priority: Major
>
> We observed some of our larger clusters(with autoscaling via Karpenter) begin 
> to mis-report their queue usages after upgrading to 1.5.2.
> As an example, given a leaf queue root.tiers.2, we observed the following 
> state:
> {code:java}
> allocated capacity(root.tiers.2) : pods 200 memory 2.187347412109375 Tib 
> vcore 0.2 k ephemeral-storage 4.294967295999999TB{code}
> but when we summed the allocations in full-state dump, we found:
> {code:java}
> root.tiers.2 : pods 0 memory 0.0 Tib vcore 0.0 k ephemeral-storage 0.0 
> TB{code}
> Similarly, we examined the number of running pods in K8s, and we found 0.  
> The queue allocations were clearly off.  
> This was fixed by applying the following patch to remove the race condition:
> {code:java}
>  func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, 
> nodeReported bool) error {
>         // check this queue: failure stops checks if the allocation is not 
> part of a node addition
> -       fit, newAllocated := sq.allocatedResFits(alloc)
> +       fit, _ := sq.allocatedResFits(alloc)
>         if !nodeReported && !fit {
>                 return fmt.Errorf("allocation (%v) puts queue '%s' over 
> maximum allocation (%v), current usage (%v)",
>                         alloc, sq.QueuePath, sq.maxResource, 
> sq.allocatedResource)
> @@ -1058,6 +1058,7 @@ func (sq *Queue) IncAllocatedResource(alloc 
> *resources.Resource, nodeReported bo
>         sq.Lock()
>         defer sq.Unlock()
>         // all OK update this queue
> +       newAllocated := resources.Add(sq.allocatedResource, alloc)
>         sq.allocatedResource = newAllocated
>         sq.updateAllocatedResourceMetrics()
>         return nil {code}
>   
> This appears to be [fixed|#L1041] in 1.6.0,  although I have not confirmed it.
> It appears to me that the race condition was introduced 
> [here|https://github.com/apache/yunikorn-core/pull/839/files#diff-27632d48eb925e150a33bc92370ceaa66c31048018d11ca7a53a0b50ab7250acL1033].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to