Hi All, Here's an other design for Helix state transition cancellation. Could you please help me review it?
Best, Junkai Introduction State transition takes a vital part of Helix managing clusters. There are different reasons can cause state transition is not necessary, for example, the node of partition running state transition is down. Thus state transition cancellation would be a useful feature to have in Helix. It not only helps cancel the state transition to avoid invalid state but also benefits for reducing redundant state transitions. This document will show the design of state transition cancellation. Problem Statement There are a couple of situations that may need to cancel a state transition: - Cancel any ongoing state transition on an instance if Helix decides not putting any replicas on this instance. One example is that user would like to add new nodes one by one. When node 1 added, one partition A can be moved from old nodes to node one. After the partition A started state transition on node 1, node 2 is added to the system. However, the partition A, started state transition in node 1, will be assigned to node 2. Thus it is better to cancel the state transition in node 1 and start state transition in node 2. Following picture showing the process of adding node. In stage 3, if Helix does not cancel the partition A state transition, the partition A will finish the state transition at node 1 and do another state transition to bring it to initial state or previous state. It is the redundant work we are trying to avoid. - Cancel all ongoing state transition once a resource is deleted or disabled. - Cancel all ongoing state transition if the instance is disabled. - Cancel all running tasks if the job is stopped, aborted or deleted. Since a job is in the non-running state, all the sub tasks in running should be cancelled to keep consistent with job status. In any of these conditions, it is better to have Helix doing a cancellation to save resources or avoid invalid operation. Existing Solutions In current Helix roadmap, cancellation is not supported. Because Helix will try to bring the systems to the ideal mapping state finally, even if there are couple of redundant operations. Thus one of situation mentioned above occurred in real world, Helix will still wait for state transition finished and issue another counter state transition to make partition in right state. Or if this state transition is stuck there for long time, state transition time out will solve the problem. Purposed DesignCancel Logic >From Helix perspective, Helix should cancel the state transitions which may cause redundant works or invalid operations. In this case, when a state transition need to be cancelled, Helix will issue a message to the participant. Then participant will be notified that the corresponding partition's state transition is cancelled. - *Ignored*: Participant can ignore the cancellation if user believes this state transition is very important or hard to cancel. Participant will keep doing state transition and finish the callback accordingly. For example, if partition cancelled during bootstrapping, there might be some data loss or even the cancel operation is not allowed. In this case, user can just ignore the cancel flag and continuing with state transition. However, the cancellation final result depends on user's implementation. There are two options that user can purpose for the cancellation reaction and final result: - *Option 1: * - *Cancelled to Previous State*: State transition can be cancelled and force setting to previous state. - *Throw Error Exception*: If anything fail during rollback or previous state is not a proper targeting state, user can throw an exception and Helix put it to error state. - *Option 2:* - *Cancelled to Cancel State*: State transition can be cancelled and set to the cancel state. Then Helix will do the state transition based on user defined cancel to other states in following steps. - *Throw Error Exception*: If anything fail during rollback, user can throw an exception and put it to error state. Cancel Failed What if cancel failed? If cancellation is failed, Helix will leave the partition as what it is now. Since when a cancel failed, Helix encounters following situations: - Ignored by participant: As the cancellation message ignored, Helix will wait until the state transition finished and receive a message for counter state transition. - State Transition Stuck: Once state transition is stuck or partition is busy, the state transition will be terminated with state transition time out. - Fail to Rollback or Clean Up: If clean up or rollback, implemented by user, is failed, the normal exception will be thrown. Thus the partition will be set to error state. In sum, Helix can handle the state transition cancellation failure respectively. Message Handling Messages for cancellation will have three different states: message not handled by participant, message handled by participant not started, message handled and started now. In this case, we shall do the cancellation accordingly for these three types of message states. - Message not handled by participant: Helix sends a message to cancel the previous state transition. These two message can received by participant at same time and not in correct order. Once the messages received by participant, participant can sort them by CREAT_TIMESTAMP. Then pre-handle messages one by one. We shall pay attention to batch message to handle it correctly. - Message handled and not started: At participant side, Helix can try to cancel the future without interrupting running threads. If cancel success, it is not stared message and can be cancelled. - Message handled and started: Once it is started, Helix will handle the cancellation as previous stated. Cancel ThreadPool Helix will provide a dedicated threadpool for cancellation. Since the operation is pretty fast (set cancel flag), it will be better to have a fixed size thread pool created by Helix. Responsibility of User StateModel implementation should take care of cleaning or rollback mechanism as the state transition does make sense for that participant. If clean up or rollback is failed, an exception should be thrown to Helix. ImplementationParticipant Perspective In the StateModel, Helix will have a new field called *cancel*. Once Helix issue an message to the participant, implementation of state transition in *StateModel* should check the *cancel* flag periodically if this state transition is cancellable. public abstract class StateModel { ... private boolean cancel; protected boolean isCancel() { return _cancel; } protected void setCancel(boolean cancel) { _cancel = cancel; } ... } Otherwise, throw corresponding Rollback Exception will help Helix figure out what is user decision. public abstract class RollbackException extends Exception { } Per following situation, user should have different implementations: - *Ignored*: If user would not like to cancel the state transition for any reason, just ignore the flag and keep doing state transition and finish the callback. - *Option 1*: Check flag to decide whether decide to cancel or not. But before throwing rollback exceptions, user have to do a clean up or roll back as following logic. check periodically { if (isCancel()) { if do rollbacks and cleanups success: throw rollback exception set to previous state else throw normal exception that states what happened } - *Option 2*: One implementation needed as define ANY state to Cancel state and cancel state to other states in state model definition. Other implementation will be similar with Option 1. check periodically { if (isCancel()) { if do rollbacks and cleanups success: throw rollback exception set to cancel state else throw normal exception that states what happened } Helix Core In the HelixTaskExecutor, Helix should handle the different exceptions that thrown by user. public void onMessage(...) { try { process message } catch (RollbackException ce) { set partition to previous state(Option 1) or cancel state (Option 2 ) } catch (Exception e) { handle the exception and set to error state } Testing Synchronized tests : We will have couple of synchronized tests that running staging phases to test the cancellation. Asynchonized tests: Integrated with ZooKeeper, start Helix controller and completely build up an environment for Helix to End-to-End test this feature.