Thanks for explaining the semantics Kishore. Comments inline.
On Wed, Feb 20, 2013 at 4:28 PM, kishore g <[email protected]> wrote: > Hi Abhishek, > > We only carry over the fact that the participant hosted that partition. The > state of that partition will be reset to initial state( default:OFFLINE). > I see, makes sense. > The idea behind this design was to detect resource deletion when the > participant was down and inform that participant when it comes up to drop > data or any local state associated with that partition. Once the drop > notification is handled, it will be removed from current state and external > view. > I see. But who is responsible for "detecting resource deletion"? Does the controller automatically set the ideal state as DROPPED for all partitions on a restarted instance? Or is it the DDS' responsibility to detect that an instance is down and therefore set ideal state to DROPPED for all its hosted partitions. > > Can you confirm that resetting the state to OFFLINE after restart is a > problem in your case. > In my case, the DDS was getting confused by the partition's current state automatically recycling back to the initial state. Besides, I wonder if the controller will automatically start generating transitions from OFFLINE towards the old ideal state (assuming the ideal state was not modified after the instance died). > > If you really need to avoid this behavior then you can implement > preConnectCallback and remove the previous session info. This wont be a > problem with future Helix version but you will have to still confirm that > old participant is dead. A better way would be provide a way to explicitly > specify a flag to not carry over the previous state. Can you please file a > jira for this. I can imagine this being useful in various use cases. > Thanks for the suggestion. I'll file a jira. > > thanks, > Kishore G > > > On Wed, Feb 20, 2013 at 3:58 PM, Abhishek Rai <[email protected]> > wrote: > > > Hi Helix devs, > > > > Currently, when creating a session for a new participant, Helix carries > > over current states of assigned partitions from previous session of the > > same participant. I think this may be undesirable for deployments where > > Helix session and assigned partitions by the participant are tightly > > coupled. Assume that in such a setup, when a participant loses a > session, > > it also loses all associated partitions. > > > > In this scenario, when the participant is restarted, and tries to > reconnect > > to Helix, ZKHelixManager (handleNewSessionAsParticipant) currently > "carries > > over" assignments from the previous session, which may not reflect true > > state of the restarted participant. Is there an easy way to not carry > over > > the state, in other words, start from scratch with no assigned partitions > > ? If not, can you think of any possible workarounds? I'm considering > > directly clearing old "current states" from Zookeeper. I'd avoid doing > > this for multiple reasons: (1) compatibility with future Helix versions, > > (2) complexity: need to make sure old participant is really dead. > > > > Thanks, > > Abhishek > > >
