[jira] [Commented] (MAPREDUCE-3315) Master-Worker Application on YARN

Nikhil S. Ketkar (Commented) (JIRA) Wed, 21 Mar 2012 02:44:08 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234229#comment-13234229
 ]


Nikhil S. Ketkar commented on MAPREDUCE-3315:
---------------------------------------------

I have started some work on this issue and I wanted to describe the overall 
approach I am taking and get some feedback. 

Let me start by describing how an application will be written using 
Master-Worker. The pseudocode below illustrates the usage. 

{code}
class WordCount {
    // Word count implementation using Master-Worker paradigm.
    // Master sends portions of the input file to workers
    // Worker builds a word count hash map (word -> count) and sends it as a 
result unit
    // Master uses the results to update master dictionary 

    class WorkUnit {
    // A list of words
    }

    class ResultUnit {
    // Map of word -> count
    }

    class Master extends MasterRunner<WorkUnit, ResultUnit> {
        public manageWorkers() {
            // Spawn a bunch of workers using spawnNewWorker();
            // For each spawned worker spawnNewWorker() will return a 
WorkerReference, keep it around for bookkeeping
            // Read a file and create WorkUnits with a fixed number of lines to 
each workers
            // Assign WorkUnits to Workers using assignWork(), use the 
previously obtained WorkerReference
            // Wait for ResultUnits using waitForResult();
            // Update master dictionary based on a received result unit
            // After work is done, kill workers using terminateWorker()
       }
    }

   class Worker extends WorkerRunner<WorkUnit, ResultUnit>  {
        public ResultUnit doWork(WorkUnit wu) {
       // Build a word count hash map (word -> count) and sends it as a result 
unit
     }
   }
}

// User writes code to setup and launch job
public static void main(String[] args) throws Exception {
    MWJobConf conf = new MWJobConf(WordCount.class);
    conf.setJobName("WordCount");
    conf.setMasterClass(Master.class);
    conf.setWorkerClass(Worker.class);
    conf.setWorkUnitClass(WorkUnit.class);
    conf.setResultUnitClass(ResultUnit.class);
   MWJobClient.runJob(conf);
 }
}
{code}

Key functionality for the Master Worker will be implemented in the following 
classes.

{code}
class MasterRunner<W, R> {
    // Spawn new Worker
    protected WorkerReference spawnNewWorker();

    // Spawn new Workers
    protected ArrayList<WorkerReference> spawnNewWorkers();

    // Assign work to any Worker, returns WorkerReference to whom it was 
assigned
    protected WorkerReference assignWork(WorkUnit wu);

    // Assign work to a specific Worker
    protected void assignWork(WorkerReference wf, WorkUnit wu);

    // Wait for result from any Worker. Blocking Call
    protected ResultUnit waitForResult();

    // Wait for result from a specific worker. Blocking Call
    protected ResultUnit waitForResult(WorkerReference wf);

    // Is this specific worker alive? Blocking Call
    protected boolean isWorkerAlive(WorkerReference wf);

    // Get alive workers? Blocking Call
    protected ArrayList<WorkerReference> isWorkerAlive(WorkerReference wf);

    // Terminate a specific Worker
    protected void terminateWorker(WorkerReference wf);

    // To be implemented by user
    public manageWorkers() = 0;
}

class WorkerRunner<W, R> {
    // To be implemented by user
    public ResultUnit doWork(WorkUnit wu) = 0;
}

class WorkUnitContainer<W> {
  // The framework passes around WorkUnitContainers which contain the user 
defined 
  // WorkUnit and some additional bookkeeping information
}

class ResultUnitContainer<R> {
 // The framework passes around ResultUnitContainers which contain the user 
defined 
 // ResultUnit and some additional bookkeeping information
}

class WorkerReference {
 // Uniquely identifies the Workers
}
{code}

A few questions I have been thinking about are:
# Should I use Hadoop IPC or RMI or something else?
# Should the Master be "in" the ApplicationManager or be run as a Container?

                
> Master-Worker Application on YARN
> ---------------------------------
>
>                 Key: MAPREDUCE-3315
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3315
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>             Fix For: 0.24.0
>
>
> Currently master worker scenarios are forced fit into Map-Reduce. Now with 
> YARN, these can be first class and would benefit real/near realtime workloads 
> and be more effective in using the cluster resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3315) Master-Worker Application on YARN

Reply via email to