[
https://issues.apache.org/jira/browse/HELIX-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hunter L updated HELIX-791:
---------------------------
Description:
Job list iterator methods and underlying data structure were added to JobDag to
support retrieval of jobs by TaskDispatcher (to be implemented) for improvement
in Task Framework.
Changelist:
1. Add RuntimeJobDag
2. Add a unit test
See below for the design considerations for this internal Helix component:
h1. Purpose
For the new generation of Helix Task Framework, we need an intermediary
iterator that extracts a list of jobs that need to be processed from
WorkflowConfig (jobDag) and WorkflowContext (jobStates). This document proposes
a design for it and presents a few important points of discussion for
performance and ease of use.
h1. Requirements
# Given a workflow or a JobQueue, it needs to iterate over all jobs in the
workflow and produce a job in an order that *satisfies the topological order*
of the DAG.
# (Optional - discuss) The JobListIterator must produce a topological ordering
of jobs in a *deterministic* way (easily doable by using ordered data
structures such as TreeMap).
# The JobListIterator must do so in an *efficient* manner and should avoid
overhead where possible.
# (Optional) Upon Controller switch, it needs to be able to re-build itself
and restore the job status possible based on WorkflowContext/JobContext.
# JobDAG must be backward-compatible.
h1. Architecture & Design
h2. Background
Optimal scheduling for a given DAG and the number of processors (in this case,
"consumers", or _Helix Participants_ that receive tasks) is a widely studied
problem in CS. For general DAGs, however, for variable number of processors,
this problem is *NP-complete*. For 2 processors, there exists a polynomial-time
algorithm but above 2, the complexity is unknown. It has been shown though,
that the DAG is like a tree and that level-by-level scheduling is optimal (Aho,
Hopcroft).
The heuristic we could utilize here is the *list scheduling algorithm*. The
fact that we do not have to consider the cost/duration of jobs greatly
simplifies the problem (which also saves us the trouble of worrying about the
optimality of the solution). Below I present a simplified pseudocode algorithm
of the list scheduling algorithm catered to our needs.
h2. Algorithm
|{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that
do not depend on any other jobs)}}
{{inflight-list = \{ };}}
{{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
{{ }}{{for}} {{each node n in ready-list in deterministic order {
}}{{//schedule new tasks}}
{{ }}{{if}} {{(Helix Participant has capacity) {}}
{{ }}{{remove n from ready-list and add to inflight-list;}}
{{ }}{{return}} {{node to the Helix Participant}}
{{ }}{{}}}
{{ }}{{else}} {{break}}{{;}}
{{ }}{{}}}
{{ }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
{{ }}{{if}} {{(Helix Participant finishes the job) {}}
{{ }}{{remove n from inflight-list;}}
{{ }}{{add every ready successor of n in DAG to ready-list}}
{{ }}{{}}}
{{ }}{{}}}
{{}}}|
This will be further tweaked at implementation time to implement the APIs of an
iterator such as _getNextJob()_.
h1. Alternative Designs
h2. Flattening the DAG
The problems with this approach (vs. unfolding the DAG one step at a time using
a ready list and an in-flight list) are the following:
# Once flattened with a topological ordering, the DAG loses its dependency
information.
# It could limit parallelizability of the DAG.
# Failed jobs could potentially block the execution of other jobs due to
artificial dependencies created in the process of flattening.
was:
Job list iterator methods and underlying data structure were added to JobDag to
support retrieval of jobs by TaskDispatcher (to be implemented) for improvement
in Task Framework.
Changelist:
1. Add RuntimeJobDag
2. Add a unit test
> TASK2.0: Add RuntimeJobDag with job iterator functionality
> ----------------------------------------------------------
>
> Key: HELIX-791
> URL: https://issues.apache.org/jira/browse/HELIX-791
> Project: Apache Helix
> Issue Type: Improvement
> Reporter: Hunter L
> Assignee: Hunter L
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Job list iterator methods and underlying data structure were added to JobDag
> to support retrieval of jobs by TaskDispatcher (to be implemented) for
> improvement in Task Framework.
> Changelist:
> 1. Add RuntimeJobDag
> 2. Add a unit test
> See below for the design considerations for this internal Helix component:
>
>
> h1. Purpose
> For the new generation of Helix Task Framework, we need an intermediary
> iterator that extracts a list of jobs that need to be processed from
> WorkflowConfig (jobDag) and WorkflowContext (jobStates). This document
> proposes a design for it and presents a few important points of discussion
> for performance and ease of use.
> h1. Requirements
> # Given a workflow or a JobQueue, it needs to iterate over all jobs in the
> workflow and produce a job in an order that *satisfies the topological order*
> of the DAG.
> # (Optional - discuss) The JobListIterator must produce a topological
> ordering of jobs in a *deterministic* way (easily doable by using ordered
> data structures such as TreeMap).
> # The JobListIterator must do so in an *efficient* manner and should avoid
> overhead where possible.
> # (Optional) Upon Controller switch, it needs to be able to re-build itself
> and restore the job status possible based on WorkflowContext/JobContext.
> # JobDAG must be backward-compatible.
> h1. Architecture & Design
> h2. Background
> Optimal scheduling for a given DAG and the number of processors (in this
> case, "consumers", or _Helix Participants_ that receive tasks) is a widely
> studied problem in CS. For general DAGs, however, for variable number of
> processors, this problem is *NP-complete*. For 2 processors, there exists a
> polynomial-time algorithm but above 2, the complexity is unknown. It has been
> shown though, that the DAG is like a tree and that level-by-level scheduling
> is optimal (Aho, Hopcroft).
> The heuristic we could utilize here is the *list scheduling algorithm*. The
> fact that we do not have to consider the cost/duration of jobs greatly
> simplifies the problem (which also saves us the trouble of worrying about the
> optimality of the solution). Below I present a simplified pseudocode
> algorithm of the list scheduling algorithm catered to our needs.
>
> h2. Algorithm
> |{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that
> do not depend on any other jobs)}}
> {{inflight-list = \{ };}}
> {{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
> {{ }}{{for}} {{each node n in ready-list in deterministic order {
> }}{{//schedule new tasks}}
> {{ }}{{if}} {{(Helix Participant has capacity) {}}
> {{ }}{{remove n from ready-list and add to inflight-list;}}
> {{ }}{{return}} {{node to the Helix Participant}}
> {{ }}{{}}}
> {{ }}{{else}} {{break}}{{;}}
> {{ }}{{}}}
> {{ }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
> {{ }}{{if}} {{(Helix Participant finishes the job) {}}
> {{ }}{{remove n from inflight-list;}}
> {{ }}{{add every ready successor of n in DAG to ready-list}}
> {{ }}{{}}}
> {{ }}{{}}}
> {{}}}|
> This will be further tweaked at implementation time to implement the APIs of
> an iterator such as _getNextJob()_.
> h1. Alternative Designs
> h2. Flattening the DAG
> The problems with this approach (vs. unfolding the DAG one step at a time
> using a ready list and an in-flight list) are the following:
> # Once flattened with a topological ordering, the DAG loses its dependency
> information.
> # It could limit parallelizability of the DAG.
> # Failed jobs could potentially block the execution of other jobs due to
> artificial dependencies created in the process of flattening.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)