[
https://issues.apache.org/jira/browse/YUNIKORN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Craig Condit resolved YUNIKORN-2233.
------------------------------------
Fix Version/s: 1.5.0
Target Version: 1.5.0
Resolution: Fixed
Merged to master.
> Scheduler cannot be stopped properly
> ------------------------------------
>
> Key: YUNIKORN-2233
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2233
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler, shim - kubernetes, test - smoke, test
> - unit
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.0
>
> Attachments: goroutines_core.png
>
>
> Once we have a {{Scheduler}}, {{RMProxy}}, {{HealthChecker}},
> {{internalMetricsCollector}}, etc. objects initialized, there's no way to
> stop the background goroutines started by them. This isn't necessarily a
> problem in a real environment, because restarting the scheduler core on its
> own is not a requirement.
> However, in the tests which use {{MockScheduler}}, we don't want goroutines
> to keep running and consume memory. Therefore, we a need a proper {{Stop()}}
> method on the most relevant types to make sure that the stop signal
> propagates to all goroutines.
> Attached screenshot shows what happens after we call {{MockScheduler.Stop()}}
> in the core (it's very similar to the {{MockScheduler}} in the shim).
> Goroutines are still running from the following types:
> * {{Scheduler}}
> * {{nodesResourceUsageMonitor}}
> * {{HealthChecker}}
> * {{RMProxy}}
> * {{EventSystemImpl}}
> * {{partitionManager}}
> * {{internalMetricsCollector}}
> Similar happens inside the shim, although it's less problematic.
> {{KubernetesShim.Stop()}} needs to be improved, because two goroutines
> depends on "stopChan", but we send a message only once. It's much better to
> call {{close(ss.stopChan)}} which causes all reader to receive the stop
> signal.
> Also, small changes are necessary in the shim-side {{MockScheduler}} to
> initiate shutdown properly (right now, we don't call
> {{fc.coreContext.StopAll()}}).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]