[ 
https://issues.apache.org/jira/browse/YUNIKORN-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155425#comment-17155425
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-272:
------------------------------------------------

The change causes a race in the tests. I have attached a log of the race 
detected:  [^data_race.txt] The important part are these points, the failure:
{code:java}
--- FAIL: TestBasicSchedulerAutoAllocation (0.15s)
    testing.go:809: race detected during execution of test
{code}
And these parts of the stacktraces:
{code:java}
Previous write at 0x00c05a3096f0 by goroutine 17:
  
github.com/apache/incubator-yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
...
      
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests/scheduler_smoke_test.go:393
 +0xdb
  testing.tRunner()
      
/home/travis/.gimme/versions/go1.12.linux.amd64/src/testing/testing.go:865 
+0x163
{code}
and for the reader:
{code:java}
Read at 0x00c05a3096f0 by goroutine 81:
  github.com/apache/incubator-yunikorn-core/pkg/events.(*EventCache).AddEvent()
      
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/events/event_cache.go:97
 +0x85
  
github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*SchedulingApplication).tryAllocate()
{code}
The reader in go routine 81 was created in:
{code:java}
  
github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests.TestContainerStateUpdater()
      
/home/travis/gopath/src/github.com/apache/incubator-yunikorn-core/pkg/scheduler/tests/scheduler_plugin_test.go:66
 +0xdb
  testing.tRunner()
      
/home/travis/.gimme/versions/go1.12.linux.amd64/src/testing/testing.go:865 
+0x163
{code}
As you can see the tests are two completely different tests:
 * reader scheduler_plugin_test.go:66
 * writer scheduler_smoke_test.go:393

{{TestBasicSchedulerAutoAllocation}} is defined in the file 
{{scheduler_smoke_test.go}} go routine 81 should not exist anymore and we 
should have re-inited

We cannot currently perform a full shutdown of all the services that we are 
running in the core. Changing the test to not start all services works around 
this issue. I know this has been discussed before but I don't think we ever did 
a follow up.

I opened a [PR|https://github.com/yangwwei/incubator-yunikorn-core/pull/2] 
against your repo to add the change.

The travis build passed with the changes on top: 
https://travis-ci.com/yangwwei/incubator-yunikorn-core/builds/175146213

> Enable event cache in branch-0.9
> --------------------------------
>
>                 Key: YUNIKORN-272
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-272
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - cache
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: data_race.txt
>
>
> The flag needs to be true to enable this feature. Set this for branch-0.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to