thanks for sending it again.

I looked at the code, even though the retry is handled on the participant.
Looks like we are not setting it for state transition message. We do have
this ability to set it for custom message type.

Fix is easy, we just need to set message.setRetryCount in this class

https://github.com/apache/helix/blob/9e51cb7bdf8424df46c6fa353e7c80d984c21193/helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationStage.java

We can read the retry count from cluster config.

There was another email I had recently sent with instructions to set up
distributed controller. In short the steps are

helixadmin create-cluster super_cluster
helixadmin addInstance super_cluster  controller1
helixadmin addInstance super_cluster  controller2
helixadmin addInstance super_cluster  controller3

start the three controller in distributed mode and provide super_cluster as
the cluster name.

Now any time you create a cluster, you can add that cluster as a resource
in the super_cluster. One of the controllers will automatically start
managing the new cluster. For e.g.
helixadmin create-cluster cluster1
helixadmin addresource super-cluster cluster1 AUTO mode leaderstandbymodel

I don't remember the exact commands on top of my head but it should look
something like that.

Yes, reset will be called when you lose zk session. It will also be invoked
when a partition goes to ERROR state and you want to get back to OFFLINE
state. ( I am not 100% sure if reset api is invoked or ERROR to OFFLINE
transition is invoked). Jason might be able to answer that.

Hope that helps.


On Wed, Jan 27, 2016 at 10:51 AM, Subramanian Raghunathan <
[email protected]> wrote:

> Hi Helix Team ,
>
>
>
>                 I am evaluating helix as a cluster management framework. I
> believe it’s very modular , highly customizable with a variety of out of
> box capabilities. Kudos to the team !
>
>
>
> I have the below queries :
>
>
>
> 1)      How to configure the number of retries  in state transition
> handlers ?
>
> http://markmail.org/message/vgc4nksocolqiqx5
>
>                 I referred to the this particular mail conversion : “you
> can configure the number of retries before a transition is considered as
> failed”
>
>
>
> 2)       Please point me to an example/interfaces of starting a
> distributed cluster controller and how to add the various clusters that the
> controllers is supposed to manage.
>
>
>
> 3)      What would be the event life cycle of the reset() method in
> TransitionHandler
>
> a.       Believe this gets called if zookeeper client session is lost or
> there’s an update to the cluster configuration
>
>
>
> Note: I am using the “helix-0.7.1” version.
>
>
>
> Thanks & Regards,
>
> Subramanian Raghunathan
>

Reply via email to