narendly commented on a change in pull request #301: Add markdown for 
quota-based scheduling
URL: https://github.com/apache/helix/pull/301#discussion_r258758919
 
 

 ##########
 File path: website/0.8.3/src/site/markdown/quota_scheduling.md
 ##########
 @@ -0,0 +1,181 @@
+<!---
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Quota-based Task Scheduling
+
+
+## Introduction
+
+![Intro](/website/0.8.3/src/site/resources/images/quota_intro.png)
+
+Quota-based task scheduling is a feature addition to Helix Task Framework that 
enables users of Task Framework to apply the notion of categories in 
distributed task management.
+
+## Purpose
+
+As Helix Task Framework gains usage in other open-source frameworks such as 
[Apache Gobblin](https://gobblin.apache.org/) and [Apache 
Pinot](http://pinot.incubator.apache.org/), it has also seen an increase in the 
variety in the types of distributed tasks it was managing. There have also been 
explicit feature requests to Helix for differentiating different types of tasks 
by creating corresponding quotas. 
+
+Quota-based task scheduling aims to fulfill these requests by allowing users 
to define a quota profile consisting of quota types and their corresponding 
quotas. The goal of this feature is threefold: 1) the user will have the 
ability to prioritize one type of workflows/jobs/tasks over another and 2) 
achieve isolation among the type of tasks and 3) make monitoring easier by 
tracking the status of distributed execution by type.
+
+
+
+## Glossary and Definitions
+
+-   Quota config: a set of string-integer mappings that indicate the  quota 
resource type, quota types, and corresponding quotas.
+-   Quota resource type: this defines the type of resource you want to set a 
quota around.
+    -   E.g.) # of threads, CPU resource, memory.
+-   Quota type: a String literal referring to user-defined types, each of 
which has a corresponding quota attached to it. Note that it has a DEFAULT 
type, the type that jobs without defined types fall under.
+    -   E.g.) TYPE_0, TYPE_1, ..., DEFAULT
+-   Quota: a number referring to a relative ratio that determines what portion 
of given resources should be allotted to a particular quota type.
+    -   E.g.) TYPE_0:  40, TYPE_1:  20, ..., DEFAULT:  40
+
+
+## Architecture
+
+### AssignableInstance
+
+AssignableInstance is an abstraction that represents each live Participant 
that is able to take on tasks from the Controller. Each AssignableInstance will 
cache what tasks it has running  as well as remaining task counts from the 
quota-based capacity calculation.
+
+### AssignableInstanceManager
+
+AssignableInstanceManager manages all AssignableInstances. It also serves as a 
connecting layer between the Controller and each AssignableInstance. 
AssignableInstanceManager also provides a set of interfaces that allows the 
Controller to easily determine whether an AssignableInstance is able to take on 
more tasks.
+
+### TaskAssigner
+
+The TaskAssigner interface provides basic API methods that involve assignments 
of tasks based on quota constraints. Currently, Task Framework only concerns 
the number of Participant-side JVM threads, each of which corresponds to an 
active task.
+
+### RuntimeJobDag (JobDagIterator)
+
+This new component serves as an iterator for JobDAGs for the Controller. 
Previously, task assignment required the Controller to iterate through all jobs 
and their underlying tasks to determine whether there were any tasks that 
needed to be assigned and scheduled. This proved to be inefficient and did not 
scale with the increasing load we were putting on Task Framework. Each 
RuntimeJobDag records states, that is, it knows what task needs to be offered 
up to the Controller for scheduling. This saves the redundant computation for 
the Controller every time it goes through the TaskSchedulingStage of the Task 
pipeline.
+
+![Architecture](/website/0.8.3/src/site/resources/images/quota_InstanceCapacityManager.jpeg)
+
+## User Manual
+
+### Review of Task Framework Glossary
+
+-   Task Framework: a component of Apache Helix. A framework on which users 
can define and run workflows, jobs, and tasks in a distributed way.
+-   Workflow: the largest unit of work in Task Framework. A workflow consists 
of one or more jobs. There are two types of workflows:
+    -   Generic workflow: a generic workflow is a workflow consisting of jobs 
(a job DAG) that are used for general purposes. A generic workflow may be 
removed if expired or timed out.
+    -   Job queue: a job queue is a special type of workflow consisting of 
jobs that tend to have a linear dependency (this dependency is configurable, 
however). There is no expiration for job queues - it lives on until it is 
deleted.
+-   Job: the second largest unit of work in Task Framework. A job consists of 
one or more mutually independent tasks. There are two types of jobs:
+    -   Generic job: a generic job is a job consisting of one or more tasks.
+    -   Targeted job: a targeted job differs from generic jobs in that these 
jobs must have a  _target resource_, and the tasks belonging to such jobs will 
be scheduled alongside the partitions of the target resource. To illustrate, an 
Espresso user of Task Framework may wish to schedule a backup job on one of 
their DBs called  _MemberDataDB_. This DB will be divided into multiple 
partitions (_MemberDataDB_1, _MemberDataDB_2, ... _MemberDataDB_N)___, and 
suppose that a targeted job is submitted such that its tasks will be paired up 
with each of those partitions. This "pairing-up" is necessary because this task 
is a backup task that needs to be on the same physical machine as those 
partitions the task is backing up.
+-   Task: the smallest unit of work in Task Framework. A task is an 
independent unit of work.
+-   Quota resource type: denotes a particular type of resource. Examples would 
be JVM thread count, memory, CPU resources, etc.. Generally, each task that 
runs on a Helix Participant (= instance, worker, node) occupies a set amount of 
resources.  **Note that only JVM thread count is the only quota resource type 
currently supported by Task Framework, with each task occupying 1 thread out of 
40 threads available per Helix Participant (instance).**
 
 Review comment:
   Combined the two glossary sections into one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to