[ 
https://issues.apache.org/jira/browse/HELIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020134#comment-16020134
 ] 

ASF GitHub Bot commented on HELIX-654:
--------------------------------------

Github user kongweihan commented on a diff in the pull request:

    https://github.com/apache/helix/pull/88#discussion_r117837530
  
    --- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
    @@ -455,6 +454,44 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
         return ra;
       }
     
    +  /**
    +   * If assignment is different from previous assignment, drop the old 
running task if it's no
    +   * longer assigned to the same instance, but not removing it from 
excludeSet because the same task
    +   * should not be assigned to the new instance right way.
    +   */
    +  private void dropRebalancedRunningTasks(Map<String, SortedSet<Integer>> 
newAssignment,
    +      Map<String, SortedSet<Integer>> oldAssignment, Map<Integer, 
PartitionAssignment> paMap,
    +      JobContext jobContext) {
    +    for (String instance : oldAssignment.keySet()) {
    +      for (Integer pId : oldAssignment.get(instance)) {
    +        if (jobContext.getPartitionState(pId) == TaskPartitionState.RUNNING
    +            && !newAssignment.get(instance).contains(pId)) {
    +          paMap.put(pId, new PartitionAssignment(instance, 
TaskPartitionState.DROPPED.name()));
    +          jobContext.setPartitionState(pId, TaskPartitionState.DROPPED);
    --- End diff --
    
    I didn't see it gets updated. In the original code, if the CurrentState is 
null, it will throw exception at line 287.


> Rebalance running task
> ----------------------
>
>                 Key: HELIX-654
>                 URL: https://issues.apache.org/jira/browse/HELIX-654
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>            Reporter: Weihan Kong
>
> h3. Feature summary
> Helix Task Framework empowers user to run tasks on instances managed by 
> Helix. There're 2 type of tasks: generic task and fixed target task. For 
> fixed target task, the task always follows the targeted partition and is 
> rebalanced if the partition is rebalanced. For generic task, Helix provides 
> user the choice to rebalance the running task or not, when the topology of 
> the cluster changes.
> For most users, it's better to disabled this feature(as default) since 
> there's no need to re-run the task every time new node is added. For users 
> with long-running tasks, enabling this feature can be very useful so that 
> when new node is added, the load of the tasks are better balanced among the 
> cluster.
> h3. Defined system behavior
> h4. When a node fails,
> h6. Feature disabled:
> * Running tasks on that failed node will be rebalanced to a live node, since 
> the task no longer exists and failed with the node.
> h6. Feature enabled:
> * Same.
> h4. When a new node is added,
> h6. Feature disabled:
> * Running tasks will continue to run on the current instance.
> * If a running task fails after a while, it might be rebalanced and run on 
> other instances, according to the new rebalance assignment under the new 
> cluster topology.
> h6. Feature enabled:
> * Running task might be cancelled and rebalanced immediately, according to 
> the new rebalance assignment under the new cluster topology.
> h3. Configuration
> A job level config field(RebalanceRunningTask) in JobConfig to enable/disable 
> this feature. By default it's false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to