If we enable at healthmgr.yaml, it becomes cluster wide - which is not the desired option. For cli, there are no changes in the code. The config property function is built-in already. All you have to do is determine - what should be the key and its value.
cheers /karthik On Fri, Aug 4, 2017 at 10:51 AM, Sanjeev Kulkarni <sanjee...@gmail.com> wrote: > Enabling health manager doesn't sound like a API. Thus I agree that Config > is not the right place for a setting like this. > I also don't like overloading cli with this. IMO cli is already overloaded > with a bunch of things that it shouldn't be. > Why can't we make this part of healthmgr.yaml itself? Or maybe > heron_internals.yaml? > > On Fri, Aug 4, 2017 at 10:12 AM, Karthik Ramasamy <kart...@streaml.io> > wrote: > > > Ashvin - > > > > Instead of adding a Config API to enable self-healing per topology, an > > interested user can enable the config using --config-property during > heron > > submit. For example, > > > > heron submit <cluster-name> --config-property > > "heron.config.topology.healthmanager.mode=enable" <topology-file> > > <topology-class> <topology-name> > > > > The advantage of this approach is that there is no hard coded config in > the > > code that will require later removal. Thoughts? > > > > cheers > > /karthik > > > > > > On Fri, Aug 4, 2017 at 8:57 AM, Ashvin A <ash...@apache.org> wrote: > > > > > Hi, > > > > > > We are in the process of merging the core building blocks of the > topology > > > health manager (HM) based on Dhalion. This integration is still > > > experimental and needs to be tested thoroughly. So it is desired that > the > > > HM be activated on-demand and remain disabled by default. Accordingly > we > > > are proposing the following scheme to launch HM process. > > > > > > We are thinking of satisfying the following constraints: > > > > > > 1. Launch on container-0, colocated with the scheduler and the > metrics > > > cache. > > > 2. Initially HM will be disabled by default. This means HM process > > > should not be started to avoid any side-effects. Once HM is well > > > tested, a > > > system wide configuration would enable HM for all topologies > submitted > > > afterwards. > > > 3. If topology explicitly configure, opt-in, HM will be started and > > take > > > actions as per the configuration, i.e. healthmgr.yaml > > > 4. Like other Heron processes, executor should manage the HM's life > > > cycle > > > > > > Accordingly we propose the following. > > > > > > 1. Add new Config api to enable self-healing per topology: > > > Config.enableHealthManager(Topology.HealthManagerMode mode). > Default > > > value will be "system" to indicate use the system wide > configuration. > > > 2. Add a new config to heron_internal.yaml: > > > "heron.healthmgr.default.mode". The value will be "disabled". > > > 3. The Scheduler will read the default value of HM mode from the > > > heron_internals config file, like done in SchedulerMain.setupLogging > > > [3]. > > > It will provide the either the user configured mode value or the > > default > > > mode value to the executor as a command line argument. > > > 4. Add HM mode to the command like arguments to heron_executor.py. > > This > > > is similar to the executor command line arguments for check pointing > > > [2]. > > > 5. The executor will launch HM if mode is not "disabled". > > > 6. Later if the default HM mode value is set to "dryrun" or > > > "self-healing", HM will be launched for all newly submitted > > topologies. > > > > > > > > > What do you think about this approach? > > > > > > Thanks, > > > Ashvin > > > > > > > > > [1] https://github.com/twitter/heron/pull/2132 > > > [2] https://github.com/twitter/heron/blob/master/heron/ > > > executor/src/python/ > > > heron_executor.py#L58 > > > [3] https://github.com/twitter/heron/blob/master/ > > > heron/scheduler-core/src/java/com/twitter/heron/scheduler/ > > > SchedulerMain.java#L277 > > > > > >