[
https://issues.apache.org/jira/browse/HELIX-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545636#comment-16545636
]
Hudson commented on HELIX-730:
------------------------------
FAILURE: Integrated in Jenkins build helix #1512 (See
[https://builds.apache.org/job/helix/1512/])
[HELIX-730] Add ThreadCountBasedAssignmentCalculator and integrate with
(narendly: rev 4db61b56e473b64ec9956f694dd2ac6a8d328ed4)
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTimeout.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeoutTaskNotStarted.java
* (add)
helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
* (add)
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureTaskNotStarted.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerRetryLimit.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowJobDependency.java
* (edit)
helix-core/src/test/java/org/apache/helix/task/TestSemiAutoStateTransition.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailure.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/controller/TestTargetExternalView.java
* (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/TestBatchEnableInstances.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeout.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureHighThreshold.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java
* (edit)
helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteWorkflow.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestRebalanceRunningTask.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/TestStateTransitionCancellation.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRetryDelay.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/manager/TestZkHelixAdmin.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestStopWorkflow.java
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerFailover.java
* (add)
helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java
* (delete)
helix-core/src/test/java/org/apache/helix/integration/task/TestGenericTaskAssignmentCalculator.java
* (edit)
helix-core/src/test/java/org/apache/helix/task/TaskSynchronizedTestBase.java
> [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with
> Workflow/JobRebalancer and fix rebalancing logic
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HELIX-730
> URL: https://issues.apache.org/jira/browse/HELIX-730
> Project: Apache Helix
> Issue Type: Improvement
> Reporter: Hunter L
> Priority: Major
>
> For quota-based scheduling of tasks, we have added the TaskAssigner interface
> that takes into account AssignableInstances by way of
> AssignableInstanceManager. In order to use this in the currently-existing
> pipeline prior to Task Framework 2.0, GenericTaskAssignmentCalculator was
> replaced with ThreadCountBasedAssignmentCalculator, which is a wrapper around
> TaskAssigner. Necessary adjustments were made in Workflow/JobRebalancer for
> this replacement. Also the rebalance logic in Workflow/JobRebalancer was
> reviewed and fixed. Additionally, TestQuotaBasedScheduling is added to test
> quota-based task scheduling. Note that quotas will apply to both generic and
> targeted jobs.
> A few bugs were uncovered during this process such as the faulty retry logic
> that never really got tasks to restart. For more details, see the changelist
> below:
> Changelist:
> 1. Add ThreadCountBasedAssignmentCalculator, a wrapper around
> ThreadCountBasedTaskAssigner
> 2. Make logic changes in JobRebalancer to enable the use of
> ThreadCountBasedAssignmentCalculator
> 3. Fix the failing test by using a thread-safe map and rename
> TestGenericTaskAssignmentCalculator to TestTaskAssignmentCalculator to better
> reflect what its tests are doing
> 4. Add retry logic that was previously absent for INIT and DROPPED tasks
> in JobRebalancer
> 5. Add TestQuotaBasedScheduling to test that jobs and tasks were being
> assigned and scheduled per quota config set in ClusterConfig
> 6. Add more log messages to aid with task-scheduling debugging in
> AssignableInstance
> 7. In AbstractTaskDispatcher, for tasks that are STOPPED, TIMED_OUT,
> TASK_ERROR, the retry logic was newly implemented so that they get re-started
> correctly
> 8. In AbstractTaskDispatcher, when enforcing overlapAssign for jobs with
> isAllowOverlapAssignment(), a fix was implemented so that only jobs whose
> state is IN_PROGRESS are considered
> 9. In AbstractTaskDispatcher, isWorkflowFinished() method was modified so
> that non-active jobs will have their tasks' resource freed from
> AssignableInstances to prevent resource leak
> 10. In markJobFailed() and markJobCompleted(), non-active jobs will have
> their tasks' resource freed from AssignableInstances to prevent resource leak
> 11. Fix the logic so that quotas do not apply to targeted jobs
> 12. Fix TestTaskRebalancer (assumes Consistent Hashing, which is no longer
> used)
> 13. Fix TestIndependentTaskRebalancer (assumes Consistent Hashing, no
> longer used)
> 14. Assignment logic was improved so that incomplete tasks whose assigned
> participants are no longer live will be re-assigned accordingly
> 15. Fix TestTaskRebalanceFailover (tasks on non-live instances will be
> re-assigned promptly)
> 16. Fix TestRebalanceRunningTask (targeted jobs will get tasks reassigned
> upon liveInstance and currentState change)
> 17. Fix a bug in FixedAssignmentCalculator and assignment logic for
> targeted jobs such that a task index will no longer be assigned multiple times
> 18. Fix TestJobFailureTaskNotStarted (tasks were not being assigned at all
> due to having reached maximum capacity for quota)
> 19. Add targetedTaskConfigMap field in JobConfig to cache TaskConfig
> objects for targeted tasks to reduce object creation and GC overload
> 20. Fix JobConfig so that it doesn't write quotaType to ZooKeeper when
> quotaType is null or not set
> 21. Fix deleteWorkflow() in TaskUtil so that the earliest delete failure
> will render the entire method as failed (and return prematurely to prevent
> breaking other ZNodes from incomplete deletion)
> 22. Fix TestDeleteWorkflow by adding another removeProperty() clause to
> lower failure rate
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)