[
https://issues.apache.org/jira/browse/YUNIKORN-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155425#comment-17155425
]
Wilfred Spiegelenburg commented on YUNIKORN-272:
------------------------------------------------
The change causes a race in the tests. I have attached a log of the race
detected: [^data_race.txt] The important part are these points, the failure:
{code:java}
--- FAIL: TestBasicSchedulerAutoAllocation (0.15s)
testing.go:809: race detected during execution of test
{code}
And these parts of the stacktraces:
{code:java}
Previous write at 0x00c05a3096f0 by goroutine 17:
github.com/apache/incubator-yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
...
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests/scheduler_smoke_test.go:393
+0xdb
testing.tRunner()
/home/travis/.gimme/versions/go1.12.linux.amd64/src/testing/testing.go:865
+0x163
{code}
and for the reader:
{code:java}
Read at 0x00c05a3096f0 by goroutine 81:
github.com/apache/incubator-yunikorn-core/pkg/events.(*EventCache).AddEvent()
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/events/event_cache.go:97
+0x85
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*SchedulingApplication).tryAllocate()
{code}
The reader in go routine 81 was created in:
{code:java}
github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests.TestContainerStateUpdater()
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests/scheduler_plugin_test.go:66
+0xdb
testing.tRunner()
/home/travis/.gimme/versions/go1.12.linux.amd64/src/testing/testing.go:865
+0x163
{code}
As you can see the tests are two completely different tests:
* reader scheduler_plugin_test.go:66
* writer scheduler_smoke_test.go:393
{{TestBasicSchedulerAutoAllocation}} is defined in the file
{{scheduler_smoke_test.go}} go routine 81 should not exist anymore and we
should have re-inited
We cannot currently perform a full shutdown of all the services that we are
running in the core. Changing the test to not start all services works around
this issue. I know this has been discussed before but I don't think we ever did
a follow up.
I opened a [PR|https://github.com/yangwwei/incubator-yunikorn-core/pull/2]
against your repo to add the change.
The travis build passed with the changes on top:
https://travis-ci.com/yangwwei/incubator-yunikorn-core/builds/175146213
> Enable event cache in branch-0.9
> --------------------------------
>
> Key: YUNIKORN-272
> URL: https://issues.apache.org/jira/browse/YUNIKORN-272
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: core - cache
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Priority: Major
> Labels: pull-request-available
> Attachments: data_race.txt
>
>
> The flag needs to be true to enable this feature. Set this for branch-0.9.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]