[ https://issues.apache.org/jira/browse/HELIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020134#comment-16020134 ]
ASF GitHub Bot commented on HELIX-654: -------------------------------------- Github user kongweihan commented on a diff in the pull request: https://github.com/apache/helix/pull/88#discussion_r117837530 --- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java --- @@ -455,6 +454,44 @@ private ResourceAssignment computeResourceMapping(String jobResource, return ra; } + /** + * If assignment is different from previous assignment, drop the old running task if it's no + * longer assigned to the same instance, but not removing it from excludeSet because the same task + * should not be assigned to the new instance right way. + */ + private void dropRebalancedRunningTasks(Map<String, SortedSet<Integer>> newAssignment, + Map<String, SortedSet<Integer>> oldAssignment, Map<Integer, PartitionAssignment> paMap, + JobContext jobContext) { + for (String instance : oldAssignment.keySet()) { + for (Integer pId : oldAssignment.get(instance)) { + if (jobContext.getPartitionState(pId) == TaskPartitionState.RUNNING + && !newAssignment.get(instance).contains(pId)) { + paMap.put(pId, new PartitionAssignment(instance, TaskPartitionState.DROPPED.name())); + jobContext.setPartitionState(pId, TaskPartitionState.DROPPED); --- End diff -- I didn't see it gets updated. In the original code, if the CurrentState is null, it will throw exception at line 287. > Rebalance running task > ---------------------- > > Key: HELIX-654 > URL: https://issues.apache.org/jira/browse/HELIX-654 > Project: Apache Helix > Issue Type: New Feature > Components: helix-core > Reporter: Weihan Kong > > h3. Feature summary > Helix Task Framework empowers user to run tasks on instances managed by > Helix. There're 2 type of tasks: generic task and fixed target task. For > fixed target task, the task always follows the targeted partition and is > rebalanced if the partition is rebalanced. For generic task, Helix provides > user the choice to rebalance the running task or not, when the topology of > the cluster changes. > For most users, it's better to disabled this feature(as default) since > there's no need to re-run the task every time new node is added. For users > with long-running tasks, enabling this feature can be very useful so that > when new node is added, the load of the tasks are better balanced among the > cluster. > h3. Defined system behavior > h4. When a node fails, > h6. Feature disabled: > * Running tasks on that failed node will be rebalanced to a live node, since > the task no longer exists and failed with the node. > h6. Feature enabled: > * Same. > h4. When a new node is added, > h6. Feature disabled: > * Running tasks will continue to run on the current instance. > * If a running task fails after a while, it might be rebalanced and run on > other instances, according to the new rebalance assignment under the new > cluster topology. > h6. Feature enabled: > * Running task might be cancelled and rebalanced immediately, according to > the new rebalance assignment under the new cluster topology. > h3. Configuration > A job level config field(RebalanceRunningTask) in JobConfig to enable/disable > this feature. By default it's false. -- This message was sent by Atlassian JIRA (v6.3.15#6346)