[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Wang updated MAPREDUCE-4495:
-------------------------------

    Attachment: MAPREDUCE-4495-v1.patch

Hello all,

Attached is the first patch for the Workflow AM (v1) as described in the design 
doc. This first version aims to demonstrate how to run a simple Java/MR 
workflow. Several improvements will come with v2 soon.

You can try it with following instructions.

Building Instruction
--------------------

  git clone git://git.apache.org/hadoop-common.git hadoop
  cd hadoop
  patch -p1 < MAPREDUCE-4495v1.patch
  mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

  The resulting Workflow Application Master TARBALL is at:
    
hadoop-mapreduce-project/hadoop-mapreduce-workflow/target/hadoop-mapreduce-workflow-3.0.0-SNAPSHOT.tar

Running an Example
--------------------

  Need a cluster running trunk, a pseudo cluster is good enough.

  Expand the Workflow Application Master TARBALL

  Go into the hadoop-mapreduce-workflow-3.0.0-SNAPSHOT/ directory

  Set YARN_HOME to your Hadoop root directory

  Run the wfam.sh script:
    ./wfam.sh -wf_xml example/wf-fork-join-java-job.xml -job_xml 
example/job-sample.xml -log_properties example/log4j-sample.properties

  Look at  the Worklfow Application Master logs for details on the run.


                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf
>
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to