PawasChhokra opened a new pull request #1453: URL: https://github.com/apache/samza/pull/1453
Feature: The aim of this feature is to make all standby container requests rack aware such that all active containers and their corresponding standby containers are always on different racks. This helps with decreased downtime of applications during rack failures. One of the requirements of this feature is that the value of `job.standbytasks.replication.factor` is at max 2 for the rack awareness functionality to be honored. Changes: 1. Added a new interface called `FaultDomainManager` which implements the `YarnFaultDomainManager` class for Yarn. This class takes care of the retrieval of node to rack information from Yarn. 2. Also defined what `FaultDomain` and `FaultDomainType` means. 3. Added config to enable making fault domain aware requests. 4. Added metrics to track fault domain aware requests. API Changes: Added a new FaultDomainManager interface. Tests: Unit Testing and tested with a running job. Upgrade instructions: TBD Usage Instructions: For a job with host affinity and standby containers, set the config `cluster-manager.fault-domain-aware.standby.enabled` to true to enable this feature. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
