[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

ASF GitHub Bot (JIRA) Fri, 23 Jun 2017 12:10:13 -0700

    [ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061392#comment-16061392
 ]


ASF GitHub Bot commented on HELIX-655:
--------------------------------------

GitHub user jiajunwang opened a pull request:

    https://github.com/apache/helix/pull/100

    [HELIX-655] Helix per-participant concurrent task throttling

    Add per participant concurrent task throttling.
    
    Add a participant configuration item "MAX_CONCURRENT_TASK" for throttling.
    New assigned task + existing running/init task <= MAX_CONCURRENT_TASK. 
Otherwise, new assignment won't be included in best possible state.
    Tasks are assigned in the order of jobs' start time. Older jobs have higher 
priority than other jobs.
    Add test case (TestTaskThrottling.java) for testing new throttling and 
priority.
    
    Ticket:
    https://issues.apache.org/jira/browse/HELIX-655
    
    Test:
    mvn test in helix-core
    
    Please refer to previous discussions in another pull request:
    https://github.com/apache/helix/pull/89

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiajunwang/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #100
    
----
commit 20685cf6e1276aaa1bf6264c1fe6a3173081d22c
Author: Jiajun Wang <jjw...@linkedin.com>
Date:   2017-05-31T21:57:23Z

    Helix per-participant concurrent task throttling
    
    Add per participant concurrent task throttling.
    
    1. Add a participant configuration item "MAX_CONCURRENT_TASK" for 
throttling setting.
       Add cluster configuration item "MAX_CONCURRENT_TASK_PER_INSTANCE" as the 
default throttling settings.
       New assigned task + existing running/init task <= MAX_CONCURRENT_TASK. 
Otherwise, new assignment won't be included in best possible state.
    2. Tasks are assigned in the order of jobs' start time. Older jobs have 
higher priority than other jobs and regular resources.
    3. Add test case (TestTaskThrottling.java) for testing new throttling and 
priority.
    
    Ticket:
    https://issues.apache.org/jira/browse/HELIX-655
    
    Test:
    mvn test in helix-core

commit e35fe4fffc952f7ccae7bfa4cbf89ef75e404a53
Author: Jiajun Wang <jjw...@linkedin.com>
Date:   2017-06-03T06:26:50Z

    Add workflow configuration to allow or disallow assigning multiple jobs to 
one instance.
    
    By default, it is not allowed that Helix assigns multiple jobs in one 
workflow to the same instances.
    If it is set to be true, the instance can start executing multiple jobs in 
each workflow.
    
    When application sets max tasks throttling for the participants, allowing 
overlapping assignment can maximize utilization.

----


> Helix per-participant concurrent task throttling
> ------------------------------------------------
>
>                 Key: HELIX-655
>                 URL: https://issues.apache.org/jira/browse/HELIX-655
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>            Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes 
> TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number 
> is large. If an application keeps scheduling jobs/jobQueues, Helix will start 
> any runnable jobs without considering the workload on the participants.
> The application may be able to carefully configures these items to achieve 
> the goal. But they won't be able to easily find the sweet spot. Especially 
> the cluster might be changing (scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). 
> Otherwise, important tasks may be blocked by other less important ones. And 
> allocated resource is wasted.
> h2. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the 
> issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback 
> from Helix to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

Reply via email to