thanks for sending it again. I looked at the code, even though the retry is handled on the participant. Looks like we are not setting it for state transition message. We do have this ability to set it for custom message type.
Fix is easy, we just need to set message.setRetryCount in this class https://github.com/apache/helix/blob/9e51cb7bdf8424df46c6fa353e7c80d984c21193/helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationStage.java We can read the retry count from cluster config. There was another email I had recently sent with instructions to set up distributed controller. In short the steps are helixadmin create-cluster super_cluster helixadmin addInstance super_cluster controller1 helixadmin addInstance super_cluster controller2 helixadmin addInstance super_cluster controller3 start the three controller in distributed mode and provide super_cluster as the cluster name. Now any time you create a cluster, you can add that cluster as a resource in the super_cluster. One of the controllers will automatically start managing the new cluster. For e.g. helixadmin create-cluster cluster1 helixadmin addresource super-cluster cluster1 AUTO mode leaderstandbymodel I don't remember the exact commands on top of my head but it should look something like that. Yes, reset will be called when you lose zk session. It will also be invoked when a partition goes to ERROR state and you want to get back to OFFLINE state. ( I am not 100% sure if reset api is invoked or ERROR to OFFLINE transition is invoked). Jason might be able to answer that. Hope that helps. On Wed, Jan 27, 2016 at 10:51 AM, Subramanian Raghunathan < [email protected]> wrote: > Hi Helix Team , > > > > I am evaluating helix as a cluster management framework. I > believe it’s very modular , highly customizable with a variety of out of > box capabilities. Kudos to the team ! > > > > I have the below queries : > > > > 1) How to configure the number of retries in state transition > handlers ? > > http://markmail.org/message/vgc4nksocolqiqx5 > > I referred to the this particular mail conversion : “you > can configure the number of retries before a transition is considered as > failed” > > > > 2) Please point me to an example/interfaces of starting a > distributed cluster controller and how to add the various clusters that the > controllers is supposed to manage. > > > > 3) What would be the event life cycle of the reset() method in > TransitionHandler > > a. Believe this gets called if zookeeper client session is lost or > there’s an update to the cluster configuration > > > > Note: I am using the “helix-0.7.1” version. > > > > Thanks & Regards, > > Subramanian Raghunathan >
