PawasChhokra opened a new pull request #1453:
URL: https://github.com/apache/samza/pull/1453


   Feature: The aim of this feature is to make all standby container requests 
rack aware such that all active containers and their corresponding standby 
containers are always on different racks. This helps with decreased downtime of 
applications during rack failures. One of the requirements of this feature is 
that the value of `job.standbytasks.replication.factor` is at max 2 for the 
rack awareness functionality to be honored.
   
   Changes: 
   1. Added a new interface called `FaultDomainManager` which implements the 
`YarnFaultDomainManager` class for Yarn. This class takes care of the retrieval 
of node to rack information from Yarn. 
   2. Also defined what `FaultDomain` and `FaultDomainType` means.
   3. Added config to enable making fault domain aware requests.
   4. Added metrics to track fault domain aware requests.
   
   API Changes: Added a new FaultDomainManager interface.
   
   Tests: Unit Testing and tested with a running job.
   
   Upgrade instructions: TBD
   
   Usage Instructions: For a job with host affinity and standby containers, set 
the config `cluster-manager.fault-domain-aware.standby.enabled` to true to 
enable this feature. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to