GitHub user narendly opened a pull request:
https://github.com/apache/helix/pull/293
PR
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/narendly/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/293.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #293
----
commit bced0996ed65c9a886b5e04788e2cc1c88fc37b1
Author: narendly <narendly@...>
Date: 2018-11-02T21:02:02Z
[HELIX-786] TEST: Make TestQuotaBasedScheduling stable
Because recent changes caused the Controller to run slower,
TestQuotaBasedScheduling was being unstable. This RB fixes this.
Changelist:
1. Use polling instead of Thread.sleep()
commit dc25bac1ebdcddb08aaab2765abfe72008b06a31
Author: narendly <narendly@...>
Date: 2018-11-02T21:03:16Z
[HELIX-786] TASK: Fix stuck tasks after Participant connection loss
When Helix Participants lose ZK connection and enter a new ZK session, that
causes all task partitions on those Participants to be reset into INIT state.
This is undesirable because in reality, these tasks are considered dropped and
should be scheduled on some other instance. This is the Controller side fix for
this problem: when we detect tasks whose assigned Participants are no longer
live, we mark them as DROPPED in their parent JobContext so that
AssignableInstance will not consider them active when it is refreshed in the
next pipeline. This enables these dropped tasks to be reassigned onto other
instances.
Note that a Participant-side fix must follow so that upon reset() on task
partitions, they should be in DROPPED state, not in INIT state. This does not
inherently solve stuck INIT states on the original Participant. However, by
letting these tasks be assigned on other instances, this fix lets jobs and
workflows complete, upon which their CurrentStates will be dropped altogether.
Changelist:
1. Mark task partitions whose assigned Participants are no longer live as
DROPPED in JobContext
commit 59536d39c85d3535408a40a46a1a60a4105ee6e4
Author: narendly <narendly@...>
Date: 2018-11-02T21:19:16Z
[HELIX-788] HELIX: Fix DefaultPipeline so that it doesn't rebalance task
resources
Helix CHO testing indicated that the default pipeline was rebalancing task
framework resources. This RB fixes this.
Changelist:
1. Change resourceMap to resourceToRebalance, which separates generic and
task resources
2. Make logger use LogUtil to distinguish two pipelines
----
---