wilfred-s commented on a change in pull request #72: Add ability to support 3rd 
party K8s operator integration
URL: 
https://github.com/apache/incubator-yunikorn-k8shim/pull/72#discussion_r383604250
 
 

 ##########
 File path: pkg/appmgmt/appmgmt_recovery.go
 ##########
 @@ -31,20 +31,24 @@ import (
 
 func (svc *AppManagementService) WaitForRecovery(maxTimeout time.Duration) 
error {
        if !svc.apiProvider.IsTestingMode() {
-               apps := svc.kickoffRecovery()
+               apps, err := svc.recoverApps()
+               if err != nil {
+                       return err
+               }
+
                return svc.waitForAppRecovery(apps, maxTimeout)
        }
        return nil
 }
 
-func (svc *AppManagementService) kickoffRecovery() 
map[string]interfaces.ManagedApp {
+func (svc *AppManagementService) recoverApps() 
(map[string]interfaces.ManagedApp, error) {
        recoveringApps := make(map[string]interfaces.ManagedApp)
        for _, mgr := range svc.managers {
                if m, ok := mgr.(interfaces.Recoverable); ok {
                        appMetas, err := m.ListApplications()
                        if err != nil {
                                log.Logger.Error("failed to list apps", 
zap.Error(err))
-                               return recoveringApps
+                               return recoveringApps, err
                        }
 
 Review comment:
   This change does not fix the issue I was trying to point out: due to 
failures on one manager we will not recover the other manager(s). I am also 
missing the feedback processing: if recovery fails do we just keep on using the 
manager as if nothing happened? How do we account for the inconsistent state?
   
   Example:
   - set up two managers
   - listing on the first manager fails (line 47)
   - recovery on the 2nd manager is not attempted
   - If the failure occurs on the 2nd manager the recoverApps is not nil but 
gets ignored
   
   Since we have a map to iterate over the order in which the managers are 
checked is random. This leaves us in a weird state: both app managers are out 
of sync with the real state, and thus the scheduler is out of sync.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to