Peter Bacsko created YUNIKORN-1874:
--------------------------------------
Summary: Data race: unlocked access in Context.updatePodCondition()
Key: YUNIKORN-1874
URL: https://issues.apache.org/jira/browse/YUNIKORN-1874
Project: Apache YuniKorn
Issue Type: Bug
Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko
Running some performance tests locally triggered data races:
{noformat}
WARNING: DATA RACE
Read at 0x00c008351538 by goroutine 50598:
k8s.io/api/core/v1.(*Pod).DeepCopyInto()
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3369
+0x44
k8s.io/api/core/v1.(*Pod).DeepCopy()
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3383
+0x7d4
github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated.func1()
/home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:358
+0x760Previous write at 0x00c008351538 by goroutine 66:
k8s.io/kubernetes/pkg/api/v1/pod.UpdatePodCondition()
/home/bacskop/go/pkg/mod/k8s.io/[email protected]/pkg/api/v1/pod/util.go:380
+0x63d
github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition()
/home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1023
+0x65d
github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate()
/home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1063
+0x30e
github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState()
/home/bacskop/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:213
+0x45
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests()
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:84 +0x38
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3()
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39
{noformat}
The problem is that inside {{Context.updatePodCondition()}}, we modify the
{{PodStatus}} field of the pod object, which we use as a source of copies.
It might be better to store the condition separately and protect it with the
task lock.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]