[ 
https://issues.apache.org/jira/browse/YUNIKORN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2233.
------------------------------------
     Fix Version/s: 1.5.0
    Target Version: 1.5.0
        Resolution: Fixed

Merged to master.

> Scheduler cannot be stopped properly
> ------------------------------------
>
>                 Key: YUNIKORN-2233
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2233
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler, shim - kubernetes, test - smoke, test 
> - unit
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>         Attachments: goroutines_core.png
>
>
> Once we have a {{Scheduler}}, {{RMProxy}}, {{HealthChecker}}, 
> {{internalMetricsCollector}}, etc. objects initialized, there's no way to 
> stop the background goroutines started by them. This isn't necessarily a 
> problem in a real environment, because restarting the scheduler core on its 
> own is not a requirement. 
> However, in the tests which use {{MockScheduler}}, we don't want goroutines 
> to keep running and consume memory. Therefore, we a need a proper {{Stop()}} 
> method on the most relevant types to make sure that the stop signal 
> propagates to all goroutines.
> Attached screenshot shows what happens after we call {{MockScheduler.Stop()}} 
> in the core (it's very similar to the {{MockScheduler}} in the shim). 
> Goroutines are still running from the following types:
> * {{Scheduler}}
> * {{nodesResourceUsageMonitor}}
> * {{HealthChecker}}
> * {{RMProxy}}
> * {{EventSystemImpl}}
> * {{partitionManager}}
> * {{internalMetricsCollector}}
> Similar happens inside the shim, although it's less problematic. 
> {{KubernetesShim.Stop()}} needs to be improved, because two goroutines 
> depends on "stopChan", but we send a message only once. It's much better to 
> call {{close(ss.stopChan)}} which causes all reader to receive the stop 
> signal.
> Also, small changes are necessary in the shim-side {{MockScheduler}} to 
> initiate shutdown properly (right now, we don't call 
> {{fc.coreContext.StopAll()}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to