Yu-Lin Chen created YUNIKORN-2327:
-------------------------------------

             Summary: Race condition during update Occupied Resource from Shim 
to Core
                 Key: YUNIKORN-2327
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2327
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - common, shim - kubernetes
            Reporter: Yu-Lin Chen
            Assignee: Yu-Lin Chen


When initializing YuniKorn, existing Non-YuniKorn pods (ForeignPods) are 
counted as node's occupied resources. An SchedulerAPI.UpdateNode(request) isĀ 
triggered asynchronously to update the occupied resources for the node in the 
core. However, a race condition occurs on the core side during this 
asynchronous update process.

{*}How to reproduce{*}:
 - Add 2 seconds delay for the first pod, the final occupied resource will 
equal to the first pod's resource size after restart YuniKorn. 
([example|https://github.com/apache/yunikorn-core/compare/master...chenyulin0719:yunikorn-core:YUNIKORN-2313-ADD-2-SECOND-DELAY#diff-3bd07740ee12121844b14ddafec10a36332fe2cd80421174110edf042f780e23R397-R399])
 - The issue is the root cause of YUNIKORN-2313

{*}Error Logs{*}: ([E2E test 
link-v1.29.0|https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7530262471])
Shim logs: (yk8s-worker)
{code:json}
2024-01-15T14:39:20.135Z Shim trigger SchedulerAPI() request: occupied: 
resources:{key:"pods" value:{value:1}}
2024-01-15T14:39:20.135Z Shim trigger SchedulerAPI() request: occupied: 
resources:{key:"memory" value:{value:52428800}} resources:{key:"pods" 
value:{value:2}} resources:{key:"vcore" value:{value:100}}
2024-01-15T14:39:20.136Z Shim trigger SchedulerAPI() request: occupied: 
resources:{key:"memory" value:{value:576716800}} resources:{key:"pods" 
value:{value:3}} resources:{key:"vcore" value:{value:200}}
{code}
Core logs: (yk8s-worker)
{code:json}
2024-01-15T14:39:20.137Z set occupiedResource: map[memory:52428800 pods:2 
vcore:100]
2024-01-15T14:39:20.137Z set occupiedResource: map[memory:576716800 pods:3 
vcore:200]
2024-01-15T14:39:22.136Z set occupiedResource: map[pods:1]
{code}
{*}Final occupied resource in state dump{*}:
{code:json}
...
        {
          "nodeID": "yk8s-worker",
          "attributes": {
            "ready": "true",
            "si.io/hostname": "yk8s-worker",
            "si.io/rackname": "/rack-default",
            "si/node-partition": "[mycluster]default"
          },
           ...
          "occupied": {
            "pods": 1
          }...
        }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to