Peter Bacsko created YUNIKORN-1874:
--------------------------------------

             Summary: Data race: unlocked access in Context.updatePodCondition()
                 Key: YUNIKORN-1874
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1874
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
            Reporter: Peter Bacsko
            Assignee: Peter Bacsko


Running some performance tests locally triggered data races:
{noformat}
WARNING: DATA RACE
Read at 0x00c008351538 by goroutine 50598:
  k8s.io/api/core/v1.(*Pod).DeepCopyInto()
      
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3369
 +0x44
  k8s.io/api/core/v1.(*Pod).DeepCopy()
      
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3383
 +0x7d4
  github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated.func1()
      /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:358 
+0x760Previous write at 0x00c008351538 by goroutine 66:
  k8s.io/kubernetes/pkg/api/v1/pod.UpdatePodCondition()
      
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/pkg/api/v1/pod/util.go:380 
+0x63d
  github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition()
      /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1023 
+0x65d
  
github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate()
      /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1063 
+0x30e
  
github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState()
      
/home/bacskop/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:213
 +0x45
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
      /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
      /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
      /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests()
      /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:84 +0x38
  
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3()
      /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39
 {noformat}

The problem is that inside {{Context.updatePodCondition()}}, we modify the 
{{PodStatus}} field of the pod object, which we use as a source of copies.

It might be better to store the condition separately and protect it with the 
task lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to