[ https://issues.apache.org/jira/browse/HADOOP-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718683#action_12718683 ]
Kevin Peterson commented on HADOOP-5303: ---------------------------------------- I'm building this now, the setup-maven.sh and setup-jars.sh have a shebang for /bin/sh, but they actually rely on bash functions. At least on my ubuntu 9.04, I had to modify these to /bin/bash to run them. > Oozie, Hadoop Workflow System > ----------------------------- > > Key: HADOOP-5303 > URL: https://issues.apache.org/jira/browse/HADOOP-5303 > Project: Hadoop Core > Issue Type: New Feature > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: Hadoop_Summit_Oozie.pdf, hws-preso-v1_0_2009FEB22.pdf, > hws-spec2009MAR09.pdf, hws-v1_0_2009FEB22.pdf, > oozie-0.18.3.o0.1-SNAPSHOT-distro.tar.gz, oozie-spec-20090521.pdf, > oozie-src-20090605.tar.gz > > > This is a proposal for a system specialized in running Hadoop/Pig jobs in a > control dependency DAG (Direct Acyclic Graph), a Hadoop workflow application. > Attached there is a complete specification and a high level overview > presentation. > ---- > *Highlights* > A Workflow application is DAG that coordinates the following types of > actions: Hadoop, Pig, Ssh, Http, Email and sub-workflows. > Flow control operations within the workflow applications can be done using > decision, fork and join nodes. Cycles in workflows are not supported. > Actions and decisions can be parameterized with job properties, actions > output (i.e. Hadoop counters, Ssh key/value pairs output) and file > information (file exists, file size, etc). Formal parameters are expressed in > the workflow definition as {{${VAR}}} variables. > A Workflow application is a ZIP file that contains the workflow definition > (an XML file), all the necessary files to run all the actions: JAR files for > Map/Reduce jobs, shells for streaming Map/Reduce jobs, native libraries, Pig > scripts, and other resource files. > Before running a workflow job, the corresponding workflow application must be > deployed in HWS. > Deploying workflow application and running workflow jobs can be done via > command line tools, a WS API and a Java API. > Monitoring the system and workflow jobs can be done via a web console, > command line tools, a WS API and a Java API. > When submitting a workflow job, a set of properties resolving all the formal > parameters in the workflow definitions must be provided. This set of > properties is a Hadoop configuration. > Possible states for a workflow jobs are: {{CREATED}}, {{RUNNING}}, > {{SUSPENDED}}, {{SUCCEEDED}}, {{KILLED}} and {{FAILED}}. > In the case of a action failure in a workflow job, depending on the type of > failure, HWS will attempt automatic retries, it will request a manual retry > or it will fail the workflow job. > HWS can make HTTP callback notifications on action start/end/failure events > and workflow end/failure events. > In the case of workflow job failure, the workflow job can be resubmitted > skipping previously completed actions. Before doing a resubmission the > workflow application could be updated with a patch to fix a problem in the > workflow application code. > ---- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.