[ 
https://issues.apache.org/jira/browse/YUNIKORN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-1874.
------------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

Merged to master.

> Data race: unlocked access in Context.updatePodCondition()
> ----------------------------------------------------------
>
>                 Key: YUNIKORN-1874
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1874
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> Running some performance tests locally triggered data races:
> {noformat}
> ==================
> WARNING: DATA RACE
> Read at 0x00c008351538 by goroutine 50598:
>   k8s.io/api/core/v1.(*Pod).DeepCopyInto()
>       
> /home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3369
>  +0x44
>   k8s.io/api/core/v1.(*Pod).DeepCopy()
>       
> /home/bacskop/go/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3383
>  +0x7d4
>   
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated.func1()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:358 
> +0x760
> Previous write at 0x00c008351538 by goroutine 66:
>   k8s.io/kubernetes/pkg/api/v1/pod.UpdatePodCondition()
>       
> /home/bacskop/go/pkg/mod/k8s.io/[email protected]/pkg/api/v1/pod/util.go:380 
> +0x63d
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1023 
> +0x65d
>   
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1063 
> +0x30e
>   
> github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState()
>       
> /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:213
>  +0x45
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:84 +0x38
>   
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39
> Goroutine 50598 (running) created at:
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:338 
> +0x8e
>   github.com/apache/yunikorn-k8shim/pkg/cache.newTaskState.func3()
>       
> /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task_state.go:394 
> +0x73
>   github.com/looplab/fsm.(*FSM).enterStateCallbacks()
>       /home/bacskop/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:470 +0xd1
>   github.com/looplab/fsm.(*FSM).Event.func2.1()
>       /home/bacskop/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:363 +0x1fe
>   github.com/looplab/fsm.transitionerStruct.transition()
>       /home/bacskop/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:422 +0x92
>   github.com/looplab/fsm.(*transitionerStruct).transition()
>       <autogenerated>:1 +0x29
>   github.com/looplab/fsm.(*FSM).doTransition()
>       /home/bacskop/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:407 +0xa72
>   github.com/looplab/fsm.(*FSM).Event()
>       /home/bacskop/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:390 +0xa48
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).handle()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:122 
> +0x26d
>   
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).TaskEventHandler.func1()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1113 
> +0x104
>   github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1.2()
>       
> /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher.go:195
>  +0x58
> Goroutine 66 (running) created at:
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService()
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x4c6
>   
> github.com/apache/yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
>       /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:87 +0x37a
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServices()
>       /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:44 +0x64
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServicesWithLogger()
>       /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:55 +0x3b
>   main.main()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cmd/shim/main.go:51 
> +0x775
> ==================
>  {noformat}
> The problem is that inside {{{}Context.updatePodCondition(){}}}, we modify 
> the {{PodStatus}} field of the pod object, which we use as a source of copies.
> It might be better to store the condition separately and protect it with the 
> task lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to