[ 
https://issues.apache.org/jira/browse/HELIX-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545636#comment-16545636
 ] 

Hudson commented on HELIX-730:
------------------------------

FAILURE: Integrated in Jenkins build helix #1512 (See 
[https://builds.apache.org/job/helix/1512/])
[HELIX-730] Add ThreadCountBasedAssignmentCalculator and integrate with 
(narendly: rev 4db61b56e473b64ec9956f694dd2ac6a8d328ed4)
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTimeout.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeoutTaskNotStarted.java
* (add) 
helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
* (add) 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureTaskNotStarted.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerRetryLimit.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowJobDependency.java
* (edit) 
helix-core/src/test/java/org/apache/helix/task/TestSemiAutoStateTransition.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailure.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/controller/TestTargetExternalView.java
* (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/TestBatchEnableInstances.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeout.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureHighThreshold.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteWorkflow.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestRebalanceRunningTask.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/TestStateTransitionCancellation.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRetryDelay.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/manager/TestZkHelixAdmin.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestStopWorkflow.java
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerFailover.java
* (add) 
helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java
* (delete) 
helix-core/src/test/java/org/apache/helix/integration/task/TestGenericTaskAssignmentCalculator.java
* (edit) 
helix-core/src/test/java/org/apache/helix/task/TaskSynchronizedTestBase.java


> [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with 
> Workflow/JobRebalancer and fix rebalancing logic
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HELIX-730
>                 URL: https://issues.apache.org/jira/browse/HELIX-730
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Priority: Major
>
> For quota-based scheduling of tasks, we have added the TaskAssigner interface 
> that takes into account AssignableInstances by way of 
> AssignableInstanceManager. In order to use this in the currently-existing 
> pipeline prior to Task Framework 2.0, GenericTaskAssignmentCalculator was 
> replaced with ThreadCountBasedAssignmentCalculator, which is a wrapper around 
> TaskAssigner. Necessary adjustments were made in Workflow/JobRebalancer for 
> this replacement. Also the rebalance logic in Workflow/JobRebalancer was 
> reviewed and fixed. Additionally, TestQuotaBasedScheduling is added to test 
> quota-based task scheduling. Note that quotas will apply to both generic and 
> targeted jobs.
> A few bugs were uncovered during this process such as the faulty retry logic 
> that never really got tasks to restart. For more details, see the changelist 
> below:
> Changelist:
>     1. Add ThreadCountBasedAssignmentCalculator, a wrapper around 
> ThreadCountBasedTaskAssigner
>     2. Make logic changes in JobRebalancer to enable the use of 
> ThreadCountBasedAssignmentCalculator
>     3. Fix the failing test by using a thread-safe map and rename 
> TestGenericTaskAssignmentCalculator to TestTaskAssignmentCalculator to better 
> reflect what its tests are doing
>     4. Add retry logic that was previously absent for INIT and DROPPED tasks 
> in JobRebalancer
>     5. Add TestQuotaBasedScheduling to test that jobs and tasks were being 
> assigned and scheduled per quota config set in ClusterConfig
>     6. Add more log messages to aid with task-scheduling debugging in 
> AssignableInstance
>     7. In AbstractTaskDispatcher, for tasks that are STOPPED, TIMED_OUT, 
> TASK_ERROR, the retry logic was newly implemented so that they get re-started 
> correctly
>     8. In AbstractTaskDispatcher, when enforcing overlapAssign for jobs with 
> isAllowOverlapAssignment(), a fix was implemented so that only jobs whose 
> state is IN_PROGRESS are considered
>     9. In AbstractTaskDispatcher, isWorkflowFinished() method was modified so 
> that non-active jobs will have their tasks' resource freed from 
> AssignableInstances to prevent resource leak
>    10. In markJobFailed() and markJobCompleted(), non-active jobs will have 
> their tasks' resource freed from AssignableInstances to prevent resource leak
>    11. Fix the logic so that quotas do not apply to targeted jobs
>    12. Fix TestTaskRebalancer (assumes Consistent Hashing, which is no longer 
> used)
>    13. Fix TestIndependentTaskRebalancer (assumes Consistent Hashing, no 
> longer used)
>    14. Assignment logic was improved so that incomplete tasks whose assigned 
> participants are no longer live will be re-assigned accordingly
>    15. Fix TestTaskRebalanceFailover (tasks on non-live instances will be 
> re-assigned promptly)
>    16. Fix TestRebalanceRunningTask (targeted jobs will get tasks reassigned 
> upon liveInstance and currentState change)
>    17. Fix a bug in FixedAssignmentCalculator and assignment logic for 
> targeted jobs such that a task index will no longer be assigned multiple times
>    18. Fix TestJobFailureTaskNotStarted (tasks were not being assigned at all 
> due to having reached maximum capacity for quota)
>    19. Add targetedTaskConfigMap field in JobConfig to cache TaskConfig 
> objects for targeted tasks to reduce object creation and GC overload
>    20. Fix JobConfig so that it doesn't write quotaType to ZooKeeper when 
> quotaType is null or not set
>    21. Fix deleteWorkflow() in TaskUtil so that the earliest delete failure 
> will render the entire method as failed (and return prematurely to prevent 
> breaking other ZNodes from incomplete deletion)
>    22. Fix TestDeleteWorkflow by adding another removeProperty() clause to 
> lower failure rate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to