[
https://issues.apache.org/jira/browse/YUNIKORN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839985#comment-17839985
]
Wilfred Spiegelenburg commented on YUNIKORN-2576:
-------------------------------------------------
When I run this only the first run passes and all other 9 runs fail with an
assertion failure like this:
{code:java}
dispatch_test.go:244: assertion failed: 10 (int32) != 1 (int32) {code}
This test was never designed to be run multiple times as those global var
values are not reset to clean up. Other tests in the same file also break as
they expect a 0 value for the async count when tey start. That again is only
true for the first run not for the runs 2..10.
I do see a data race but the race is triggered by
{{TestExceedAsyncDispatchLimit()}}
Further point is that we should not use the
{{atomic.AddInt32(&asyncDispatchCount, 1)}} but we should use the
{{atomic.Int32}} introduced in go 1.19 and call {{asyncDispatchCount.Add(1)}}
Not sure if this requires a full refactor of the dispatcher or that these tests
need to be fixed to be able to handle multiple runs correctly.
> Data Race: Flaky tests in dispatcher_test.go
> --------------------------------------------
>
> Key: YUNIKORN-2576
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2576
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: test - unit
> Reporter: Yu-Lin Chen
> Priority: Major
> Attachments: shim-race.txt
>
>
> How to reproduce:
> # In Shim, run 'go test ./pkg/... -race -count=10 > shim-race.txt'
>
> {code:java}
> WARNING: DATA RACE
> Write at 0x0000035315e0 by goroutine 88:
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.initDispatcher()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:73
> +0x2c4
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.createDispatcher()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:305
> +0x2f
> runtime.Goexit()
> /usr/local/go/src/runtime/panic.go:626 +0x5d
> testing.(*T).FailNow()
> <autogenerated>:1 +0x31
> gotest.tools/v3/assert.Equal()
>
> /home/chenyulin0719/go/pkg/mod/gotest.tools/[email protected]/assert/assert.go:205
> +0x1aa
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:244
> +0x2ba
> testing.tRunner()
> /usr/local/go/src/testing/testing.go:1689 +0x21e
> testing.(*T).Run.gowrap1()
> /usr/local/go/src/testing/testing.go:1742 +0x44Previous read at
> 0x0000035315e0 by goroutine 90:
>
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.func1()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:188
> +0x2f5
>
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch.gowrap1()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:197
> +0x6eGoroutine 88 (running) created at:
> testing.(*T).Run()
> /usr/local/go/src/testing/testing.go:1742 +0x825
> testing.runTests.func1()
> /usr/local/go/src/testing/testing.go:2161 +0x85
> testing.tRunner()
> /usr/local/go/src/testing/testing.go:1689 +0x21e
> testing.runTests()
> /usr/local/go/src/testing/testing.go:2159 +0x8be
> testing.(*M).Run()
> /usr/local/go/src/testing/testing.go:2027 +0xf17
> main.main()
> _testmain.go:55 +0x2bdGoroutine 90 (running) created at:
>
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).asyncDispatch()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:178
> +0x391
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.(*Dispatcher).dispatch()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:164
> +0xbb
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Dispatch()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:142
> +0x71
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.TestDispatchTimeout()
>
> /home/chenyulin0719/yunikorn/yunikorn-k8shim/pkg/dispatcher/dispatch_test.go:232
> +0x244
> testing.tRunner()
> /usr/local/go/src/testing/testing.go:1689 +0x21e
> testing.(*T).Run.gowrap1()
> /usr/local/go/src/testing/testing.go:1742 +0x44
> =================={code}
> Root Cause:
> * The [globla
> vairables|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L46-L51]
> in dispatcher.go is not protected when running unit tests. Each unit test
> will run initDispatcher() through
> [createDispatcher()|https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatch_test.go#L305].
> * Race occurs if any other unit tests read/write the global variables before
> or after initDispatcher(). ex: TestDispatchTimeout()
> [https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L188]
>
> Solution to be discussed:
> # Refactor dispatcher.go and encapsulates global variables to Dispatcher
> struct
> , change Dispatcher.Start(), Dispatcher.Stop() to type method
> # Implement Singleton in getDispatcher() and add a new function
> newDispatcher()
> # Create a new Dispatcher for each unit test
>
> The race issue only happens in unit test becasue the shared vairable was
> protected by
> once.Do(initDispatcher) in dispatcher.go :
> [https://github.com/chenyulin0719/yunikorn-k8shim/blob/64b204a2fb3b83fde9d86ea58f5f0d1e42187472/pkg/dispatcher/dispatcher.go#L131]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]