GitHub user narendly opened a pull request:

    https://github.com/apache/helix/pull/293

    PR

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/narendly/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #293
    
----
commit bced0996ed65c9a886b5e04788e2cc1c88fc37b1
Author: narendly <narendly@...>
Date:   2018-11-02T21:02:02Z

    [HELIX-786] TEST: Make TestQuotaBasedScheduling stable
    
    Because recent changes caused the Controller to run slower, 
TestQuotaBasedScheduling was being unstable. This RB fixes this.
    Changelist:
    1. Use polling instead of Thread.sleep()

commit dc25bac1ebdcddb08aaab2765abfe72008b06a31
Author: narendly <narendly@...>
Date:   2018-11-02T21:03:16Z

    [HELIX-786] TASK: Fix stuck tasks after Participant connection loss
    
    When Helix Participants lose ZK connection and enter a new ZK session, that 
causes all task partitions on those Participants to be reset into INIT state. 
This is undesirable because in reality, these tasks are considered dropped and 
should be scheduled on some other instance. This is the Controller side fix for 
this problem: when we detect tasks whose assigned Participants are no longer 
live, we mark them as DROPPED in their parent JobContext so that 
AssignableInstance will not consider them active when it is refreshed in the 
next pipeline. This enables these dropped tasks to be reassigned onto other 
instances.
    
    Note that a Participant-side fix must follow so that upon reset() on task 
partitions, they should be in DROPPED state, not in INIT state. This does not 
inherently solve stuck INIT states on the original Participant. However, by 
letting these tasks be assigned on other instances, this fix lets jobs and 
workflows complete, upon which their CurrentStates will be dropped altogether.
    
    Changelist:
    1. Mark task partitions whose assigned Participants are no longer live as 
DROPPED in JobContext

commit 59536d39c85d3535408a40a46a1a60a4105ee6e4
Author: narendly <narendly@...>
Date:   2018-11-02T21:19:16Z

    [HELIX-788] HELIX: Fix DefaultPipeline so that it doesn't rebalance task 
resources
    
    Helix CHO testing indicated that the default pipeline was rebalancing task 
framework resources. This RB fixes this.
    Changelist:
    1. Change resourceMap to resourceToRebalance, which separates generic and 
task resources
    2. Make logger use LogUtil to distinguish two pipelines

----


---

Reply via email to