[ 
https://issues.apache.org/jira/browse/HELIX-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hunter L updated HELIX-791:
---------------------------
    Description: 
Job list iterator methods and underlying data structure were added to JobDag to 
support retrieval of jobs by TaskDispatcher (to be implemented) for improvement 
in Task Framework.

Changelist:
 1. Add RuntimeJobDag
 2. Add a unit test

See below for the design considerations for this internal Helix component:

 

 
h1. Purpose

For the new generation of Helix Task Framework, we need an intermediary 
iterator that extracts a list of jobs that need to be processed from 
WorkflowConfig (jobDag) and WorkflowContext (jobStates). This document proposes 
a design for it and presents a few important points of discussion for 
performance and ease of use.
h1. Requirements
 # Given a workflow or a JobQueue, it needs to iterate over all jobs in the 
workflow and produce a job in an order that *satisfies the topological order* 
of the DAG.
 # (Optional - discuss) The JobListIterator must produce a topological ordering 
of jobs in a *deterministic* way (easily doable by using ordered data 
structures such as TreeMap).
 # The JobListIterator must do so in an *efficient* manner and should avoid 
overhead where possible.
 # (Optional) Upon Controller switch, it needs to be able to re-build itself 
and restore the job status possible based on WorkflowContext/JobContext.
 # JobDAG must be backward-compatible.

h1. Architecture & Design
h2. Background

Optimal scheduling for a given DAG and the number of processors (in this case, 
"consumers", or _Helix Participants_ that receive tasks) is a widely studied 
problem in CS. For general DAGs, however, for variable number of processors, 
this problem is *NP-complete*. For 2 processors, there exists a polynomial-time 
algorithm but above 2, the complexity is unknown. It has been shown though, 
that the DAG is like a tree and that level-by-level scheduling is optimal (Aho, 
Hopcroft).

The heuristic we could utilize here is the *list scheduling algorithm*. The 
fact that we do not have to consider the cost/duration of jobs greatly 
simplifies the problem (which also saves us the trouble of worrying about the 
optimality of the solution). Below I present a simplified pseudocode algorithm 
of the list scheduling algorithm catered to our needs.

 
h2. Algorithm
|{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that 
do not depend on any other jobs)}}
{{inflight-list = \{ };}}
{{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
{{    }}{{for}} {{each node n in ready-list in deterministic order { 
}}{{//schedule new tasks}}
{{        }}{{if}} {{(Helix Participant has capacity) {}}
{{            }}{{remove n from ready-list and add to inflight-list;}}
{{            }}{{return}} {{node to the Helix Participant}}
{{        }}{{}}}
{{        }}{{else}} {{break}}{{;}}
{{    }}{{}}}
{{    }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
{{        }}{{if}} {{(Helix Participant finishes the job) {}}
{{            }}{{remove n from inflight-list;}}
{{            }}{{add every ready successor of n in DAG to ready-list}}
{{        }}{{}}}
{{    }}{{}}}
{{}}}|

This will be further tweaked at implementation time to implement the APIs of an 
iterator such as _getNextJob()_.
h1. Alternative Designs
h2. Flattening the DAG

The problems with this approach (vs. unfolding the DAG one step at a time using 
a ready list and an in-flight list) are the following:
 # Once flattened with a topological ordering, the DAG loses its dependency 
information.
 # It could limit parallelizability of the DAG.
 # Failed jobs could potentially block the execution of other jobs due to 
artificial dependencies created in the process of flattening.

 

 

  was:
Job list iterator methods and underlying data structure were added to JobDag to 
support retrieval of jobs by TaskDispatcher (to be implemented) for improvement 
in Task Framework.

Changelist:
1. Add RuntimeJobDag
2. Add a unit test


> TASK2.0: Add RuntimeJobDag with job iterator functionality
> ----------------------------------------------------------
>
>                 Key: HELIX-791
>                 URL: https://issues.apache.org/jira/browse/HELIX-791
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Assignee: Hunter L
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Job list iterator methods and underlying data structure were added to JobDag 
> to support retrieval of jobs by TaskDispatcher (to be implemented) for 
> improvement in Task Framework.
> Changelist:
>  1. Add RuntimeJobDag
>  2. Add a unit test
> See below for the design considerations for this internal Helix component:
>  
>  
> h1. Purpose
> For the new generation of Helix Task Framework, we need an intermediary 
> iterator that extracts a list of jobs that need to be processed from 
> WorkflowConfig (jobDag) and WorkflowContext (jobStates). This document 
> proposes a design for it and presents a few important points of discussion 
> for performance and ease of use.
> h1. Requirements
>  # Given a workflow or a JobQueue, it needs to iterate over all jobs in the 
> workflow and produce a job in an order that *satisfies the topological order* 
> of the DAG.
>  # (Optional - discuss) The JobListIterator must produce a topological 
> ordering of jobs in a *deterministic* way (easily doable by using ordered 
> data structures such as TreeMap).
>  # The JobListIterator must do so in an *efficient* manner and should avoid 
> overhead where possible.
>  # (Optional) Upon Controller switch, it needs to be able to re-build itself 
> and restore the job status possible based on WorkflowContext/JobContext.
>  # JobDAG must be backward-compatible.
> h1. Architecture & Design
> h2. Background
> Optimal scheduling for a given DAG and the number of processors (in this 
> case, "consumers", or _Helix Participants_ that receive tasks) is a widely 
> studied problem in CS. For general DAGs, however, for variable number of 
> processors, this problem is *NP-complete*. For 2 processors, there exists a 
> polynomial-time algorithm but above 2, the complexity is unknown. It has been 
> shown though, that the DAG is like a tree and that level-by-level scheduling 
> is optimal (Aho, Hopcroft).
> The heuristic we could utilize here is the *list scheduling algorithm*. The 
> fact that we do not have to consider the cost/duration of jobs greatly 
> simplifies the problem (which also saves us the trouble of worrying about the 
> optimality of the solution). Below I present a simplified pseudocode 
> algorithm of the list scheduling algorithm catered to our needs.
>  
> h2. Algorithm
> |{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that 
> do not depend on any other jobs)}}
> {{inflight-list = \{ };}}
> {{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
> {{    }}{{for}} {{each node n in ready-list in deterministic order { 
> }}{{//schedule new tasks}}
> {{        }}{{if}} {{(Helix Participant has capacity) {}}
> {{            }}{{remove n from ready-list and add to inflight-list;}}
> {{            }}{{return}} {{node to the Helix Participant}}
> {{        }}{{}}}
> {{        }}{{else}} {{break}}{{;}}
> {{    }}{{}}}
> {{    }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
> {{        }}{{if}} {{(Helix Participant finishes the job) {}}
> {{            }}{{remove n from inflight-list;}}
> {{            }}{{add every ready successor of n in DAG to ready-list}}
> {{        }}{{}}}
> {{    }}{{}}}
> {{}}}|
> This will be further tweaked at implementation time to implement the APIs of 
> an iterator such as _getNextJob()_.
> h1. Alternative Designs
> h2. Flattening the DAG
> The problems with this approach (vs. unfolding the DAG one step at a time 
> using a ready list and an in-flight list) are the following:
>  # Once flattened with a topological ordering, the DAG loses its dependency 
> information.
>  # It could limit parallelizability of the DAG.
>  # Failed jobs could potentially block the execution of other jobs due to 
> artificial dependencies created in the process of flattening.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to