Sorry I have not responded sooner, I am trying to catch up on the mailing list.
1. There is a patch for nimbus HA, but it looks like it has been abandoned. If you want to try and pick it up and try to address the review comments that would be great. https://issues.apache.org/jira/browse/STORM-166 https://github.com/apache/incubator-storm/pull/61 2. The spout has its own timeout and will call fail on itself if it has not received an ack or fail message from the acker after the timeout interval. The acker itself also has a timeout, but it simply throws away the tree after that timeout and relies on the spout to also timeout. If the acker gets a fail message from a bolt it will propagate the fail to the spout and remove the tree. If the tree is ever fully acked it propagates the ack message to the spout. 3. It is up to the spout to decide how it will replay the tuple. In some cases it can ask the pub/sub system to replay the tuple, for others it may not be able to and all it can do is keep track of the failure. 4. If a worker exits the supervisor will time it out and restart it. If the supervisor does not restart it fast enough nimbus may detect the timeout and reschedule it on a different supervisor. 5. Supervisor starts and stops workers. It also downloads the dependencies and cleans up after them. - Bobby On 7/24/14, 3:13 AM, "Zhang,Anzhan" <[email protected]> wrote: >Dear all, >I have several questions during my learning of Storm implementation and >architecture. Although I read >http://storm.incubator.apache.org/documentation/Home.html carefully, but >I still cannot get the answer, I am writing this email to ask your help, >and any comments are very appreciated. > >1. Nimbus is singleton in a storm cluster? I think it’s single in >the storm.yaml confiugration file. As if it support more than 1, it >should be configured there. If so, why Nimbus not set to be more than 1 >to let the ZK manages the leader selection of the Nimbus, then the nimbus >is HA and not SPOF? If nimbus died, who will take resposibility for >restarting it? > >2. The success handling of the tuple will be updated to the task by >acker. And the design for acker is so so so excellence. My question is by >how the acker will detect the failure of the Tuple handler? Only by when >ack val not == 0 when timeout? > >3. If the acker reports the failure to the Spout task, how the >Spout task restart emit the tuple? Will it choose some other worker? As >if it emits the tuple to the same call stack, it may fail at the same >place. > >4. If a worker exits, who will take resposibility for restarting >the worker? > >5. What’s the duty for Supoervisor? Just for starting the defined >number of worker? > >Thank you in advance! > >Best Regards >Anzhan Zhang 张安站 >Baidu > >PS > >Ext: 3153 >Hi: anzhsoft >Cubicle:F4-B180 >
