narendly commented on a change in pull request #301: Add markdown for quota-based scheduling URL: https://github.com/apache/helix/pull/301#discussion_r258758919
########## File path: website/0.8.3/src/site/markdown/quota_scheduling.md ########## @@ -0,0 +1,181 @@ +<!--- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Quota-based Task Scheduling + + +## Introduction + +![Intro](/website/0.8.3/src/site/resources/images/quota_intro.png) + +Quota-based task scheduling is a feature addition to Helix Task Framework that enables users of Task Framework to apply the notion of categories in distributed task management. + +## Purpose + +As Helix Task Framework gains usage in other open-source frameworks such as [Apache Gobblin](https://gobblin.apache.org/) and [Apache Pinot](http://pinot.incubator.apache.org/), it has also seen an increase in the variety in the types of distributed tasks it was managing. There have also been explicit feature requests to Helix for differentiating different types of tasks by creating corresponding quotas. + +Quota-based task scheduling aims to fulfill these requests by allowing users to define a quota profile consisting of quota types and their corresponding quotas. The goal of this feature is threefold: 1) the user will have the ability to prioritize one type of workflows/jobs/tasks over another and 2) achieve isolation among the type of tasks and 3) make monitoring easier by tracking the status of distributed execution by type. + + + +## Glossary and Definitions + +- Quota config: a set of string-integer mappings that indicate the quota resource type, quota types, and corresponding quotas. +- Quota resource type: this defines the type of resource you want to set a quota around. + - E.g.) # of threads, CPU resource, memory. +- Quota type: a String literal referring to user-defined types, each of which has a corresponding quota attached to it. Note that it has a DEFAULT type, the type that jobs without defined types fall under. + - E.g.) TYPE_0, TYPE_1, ..., DEFAULT +- Quota: a number referring to a relative ratio that determines what portion of given resources should be allotted to a particular quota type. + - E.g.) TYPE_0: 40, TYPE_1: 20, ..., DEFAULT: 40 + + +## Architecture + +### AssignableInstance + +AssignableInstance is an abstraction that represents each live Participant that is able to take on tasks from the Controller. Each AssignableInstance will cache what tasks it has running as well as remaining task counts from the quota-based capacity calculation. + +### AssignableInstanceManager + +AssignableInstanceManager manages all AssignableInstances. It also serves as a connecting layer between the Controller and each AssignableInstance. AssignableInstanceManager also provides a set of interfaces that allows the Controller to easily determine whether an AssignableInstance is able to take on more tasks. + +### TaskAssigner + +The TaskAssigner interface provides basic API methods that involve assignments of tasks based on quota constraints. Currently, Task Framework only concerns the number of Participant-side JVM threads, each of which corresponds to an active task. + +### RuntimeJobDag (JobDagIterator) + +This new component serves as an iterator for JobDAGs for the Controller. Previously, task assignment required the Controller to iterate through all jobs and their underlying tasks to determine whether there were any tasks that needed to be assigned and scheduled. This proved to be inefficient and did not scale with the increasing load we were putting on Task Framework. Each RuntimeJobDag records states, that is, it knows what task needs to be offered up to the Controller for scheduling. This saves the redundant computation for the Controller every time it goes through the TaskSchedulingStage of the Task pipeline. + +![Architecture](/website/0.8.3/src/site/resources/images/quota_InstanceCapacityManager.jpeg) + +## User Manual + +### Review of Task Framework Glossary + +- Task Framework: a component of Apache Helix. A framework on which users can define and run workflows, jobs, and tasks in a distributed way. +- Workflow: the largest unit of work in Task Framework. A workflow consists of one or more jobs. There are two types of workflows: + - Generic workflow: a generic workflow is a workflow consisting of jobs (a job DAG) that are used for general purposes. A generic workflow may be removed if expired or timed out. + - Job queue: a job queue is a special type of workflow consisting of jobs that tend to have a linear dependency (this dependency is configurable, however). There is no expiration for job queues - it lives on until it is deleted. +- Job: the second largest unit of work in Task Framework. A job consists of one or more mutually independent tasks. There are two types of jobs: + - Generic job: a generic job is a job consisting of one or more tasks. + - Targeted job: a targeted job differs from generic jobs in that these jobs must have a _target resource_, and the tasks belonging to such jobs will be scheduled alongside the partitions of the target resource. To illustrate, an Espresso user of Task Framework may wish to schedule a backup job on one of their DBs called _MemberDataDB_. This DB will be divided into multiple partitions (_MemberDataDB_1, _MemberDataDB_2, ... _MemberDataDB_N)___, and suppose that a targeted job is submitted such that its tasks will be paired up with each of those partitions. This "pairing-up" is necessary because this task is a backup task that needs to be on the same physical machine as those partitions the task is backing up. +- Task: the smallest unit of work in Task Framework. A task is an independent unit of work. +- Quota resource type: denotes a particular type of resource. Examples would be JVM thread count, memory, CPU resources, etc.. Generally, each task that runs on a Helix Participant (= instance, worker, node) occupies a set amount of resources. **Note that only JVM thread count is the only quota resource type currently supported by Task Framework, with each task occupying 1 thread out of 40 threads available per Helix Participant (instance).** Review comment: Combined the two glossary sections into one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services