[
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bo Wang updated MAPREDUCE-4495:
-------------------------------
Attachment: MAPREDUCE-4495-v1.patch
Hello all,
Attached is the first patch for the Workflow AM (v1) as described in the design
doc. This first version aims to demonstrate how to run a simple Java/MR
workflow. Several improvements will come with v2 soon.
You can try it with following instructions.
Building Instruction
--------------------
git clone git://git.apache.org/hadoop-common.git hadoop
cd hadoop
patch -p1 < MAPREDUCE-4495v1.patch
mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true
The resulting Workflow Application Master TARBALL is at:
hadoop-mapreduce-project/hadoop-mapreduce-workflow/target/hadoop-mapreduce-workflow-3.0.0-SNAPSHOT.tar
Running an Example
--------------------
Need a cluster running trunk, a pseudo cluster is good enough.
Expand the Workflow Application Master TARBALL
Go into the hadoop-mapreduce-workflow-3.0.0-SNAPSHOT/ directory
Set YARN_HOME to your Hadoop root directory
Run the wfam.sh script:
./wfam.sh -wf_xml example/wf-fork-join-java-job.xml -job_xml
example/job-sample.xml -log_properties example/log4j-sample.properties
Look at the Worklfow Application Master logs for details on the run.
> Workflow Application Master in YARN
> -----------------------------------
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.0-alpha
> Reporter: Bo Wang
> Assignee: Bo Wang
> Attachments: MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf
>
>
> It is useful to have a workflow application master, which will be capable of
> running a DAG of jobs. The workflow client submits a DAG request to the AM
> and then the AM will manage the life cycle of this application in terms of
> requesting the needed resources from the RM, and starting, monitoring and
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master,
> these are some of the advantages:
> - Less number of consumed resources, since only one application master will
> be spawned for the whole workflow.
> - Reuse of resources, since the same resources can be used by multiple
> consecutive jobs in the workflow (no need to request/wait for resources for
> every individual job from the central RM).
> - More optimization opportunities in terms of collective resource requests.
> - Optimization opportunities in terms of rewriting and composing jobs in the
> workflow (e.g. pushing down Mappers).
> - This Application Master can be reused/extended by higher systems like Pig
> and hive to provide an optimized way of running their workflows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira