Ashvin - Instead of adding a Config API to enable self-healing per topology, an interested user can enable the config using --config-property during heron submit. For example,
heron submit <cluster-name> --config-property "heron.config.topology.healthmanager.mode=enable" <topology-file> <topology-class> <topology-name> The advantage of this approach is that there is no hard coded config in the code that will require later removal. Thoughts? cheers /karthik On Fri, Aug 4, 2017 at 8:57 AM, Ashvin A <[email protected]> wrote: > Hi, > > We are in the process of merging the core building blocks of the topology > health manager (HM) based on Dhalion. This integration is still > experimental and needs to be tested thoroughly. So it is desired that the > HM be activated on-demand and remain disabled by default. Accordingly we > are proposing the following scheme to launch HM process. > > We are thinking of satisfying the following constraints: > > 1. Launch on container-0, colocated with the scheduler and the metrics > cache. > 2. Initially HM will be disabled by default. This means HM process > should not be started to avoid any side-effects. Once HM is well > tested, a > system wide configuration would enable HM for all topologies submitted > afterwards. > 3. If topology explicitly configure, opt-in, HM will be started and take > actions as per the configuration, i.e. healthmgr.yaml > 4. Like other Heron processes, executor should manage the HM's life > cycle > > Accordingly we propose the following. > > 1. Add new Config api to enable self-healing per topology: > Config.enableHealthManager(Topology.HealthManagerMode mode). Default > value will be "system" to indicate use the system wide configuration. > 2. Add a new config to heron_internal.yaml: > "heron.healthmgr.default.mode". The value will be "disabled". > 3. The Scheduler will read the default value of HM mode from the > heron_internals config file, like done in SchedulerMain.setupLogging > [3]. > It will provide the either the user configured mode value or the default > mode value to the executor as a command line argument. > 4. Add HM mode to the command like arguments to heron_executor.py. This > is similar to the executor command line arguments for check pointing > [2]. > 5. The executor will launch HM if mode is not "disabled". > 6. Later if the default HM mode value is set to "dryrun" or > "self-healing", HM will be launched for all newly submitted topologies. > > > What do you think about this approach? > > Thanks, > Ashvin > > > [1] https://github.com/twitter/heron/pull/2132 > [2] https://github.com/twitter/heron/blob/master/heron/ > executor/src/python/ > heron_executor.py#L58 > [3] https://github.com/twitter/heron/blob/master/ > heron/scheduler-core/src/java/com/twitter/heron/scheduler/ > SchedulerMain.java#L277 >
