rmatharu opened a new pull request #1347: SAMZA-2511 : Adding logic to handle container stop fail URL: https://github.com/apache/samza/pull/1347 Problem: The standby container manager does not handle container-stop-failures. These events can happen as a result of certificate/authentication issue during the execution of the container-stop. The problem is the standby-container-failover flow relies on a stop-container succeeding and in this case does not complete the failover. This means the active container, for which a failover was initiated, is never started again. In case of a container-placement action, that runs into container-stop-fail, the action is declared as failed. Cause: Above. Fix: The fix is for standby-container-manager to intercept and handle these events by continuing the failover by either selecting another standby container (if one is present i.e., rf > 2) or using a standby host or using any-host. API changes: None Upgrade Instructions: None
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
