[
https://issues.apache.org/jira/browse/YUNIKORN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg resolved YUNIKORN-1714.
---------------------------------------------
Fix Version/s: 1.3.0
Resolution: Fixed
change committed, thanks for the quick fix
> Fatal error: concurrent write/read when calling Queue.RemoveApplication()
> -------------------------------------------------------------------------
>
> Key: YUNIKORN-1714
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1714
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.3.0
>
>
> Encountered this problem when doing some local testing with lot of running
> applications:
> {noformat}
> fatal error: concurrent map read and map write
> goroutine 8785 [running]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).RemoveApplication(0xc0002e0840,
> 0xc004a1cc40)
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:697
> +0x65
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).UnSetQueue(0xc004a1cc40)
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1493
> +0x45
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).moveTerminatedApp(0xc0002aa600,
> {0xc00372e4e0, 0x16})
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/partition.go:1409
> +0x73
> created by
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831
> +0xaa
> ...
> goroutine 8782 [runnable]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).timeoutStateTimer.func1()
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:298
> created by time.goFunc
> /snap/go/current/src/time/sleep.go:176 +0x32
> goroutine 8623 [runnable]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback.func1()
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831
> runtime.goexit()
> /snap/go/current/src/runtime/asm_amd64.s:1598 +0x1
> created by
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831
> +0xaa
> goroutine 8786 [runnable]:
> go.uber.org/zap.(*stacktrace).Next(...)
> /home/bacskop/go/pkg/mod/go.uber.org/[email protected]/stacktrace.go:127
> go.uber.org/zap.(*Logger).check(0xc0003bb650, 0x0, {0x1e6c20c, 0x2c})
> /home/bacskop/go/pkg/mod/go.uber.org/[email protected]/logger.go:372 +0x7e5
> go.uber.org/zap.(*Logger).Info(0xc0002e0420?, {0x1e6c20c?, 0x1?},
> {0xc005745680, 0x2, 0x2})
> /home/bacskop/go/pkg/mod/go.uber.org/[email protected]/logger.go:219 +0x3b
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).RemoveApplication(0xc0002e0840,
> 0xc004aa0380)
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:742
> +0xcc6
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).UnSetQueue(0xc004aa0380)
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1493
> +0x45
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).moveTerminatedApp(0xc0002aa600,
> {0xc00372e498, 0x16})
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/partition.go:1409
> +0x73
> created by
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
>
> /home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831
> +0xaa
> {noformat}
> There is an unprotected access to {{sq.applications[]}}, the code checks if
> an application exist without locking. But this can fail because the map can
> be modified concurrently, which Go detects and does not allow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]