svn commit: r1053698 [14/17] - in /websites/staging/oozie/trunk/content: ./ docs/5.2.0/ docs/5.2.0/css/ docs/5.2.0/fonts/ docs/5.2.0/images/ docs/5.2.0/images/logos/ docs/5.2.0/images/profiles/ docs/5.2.0/img/ docs/5.2.0/js/

buildbot Fri, 06 Dec 2019 01:11:06 -0800

Added: 
websites/staging/oozie/trunk/content/docs/5.2.0/WorkflowFunctionalSpec.html
==============================================================================
--- websites/staging/oozie/trunk/content/docs/5.2.0/WorkflowFunctionalSpec.html 
(added)
+++ websites/staging/oozie/trunk/content/docs/5.2.0/WorkflowFunctionalSpec.html 
Fri Dec  6 09:10:49 2019
@@ -0,0 +1,4960 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2019-12-05 
+ | Rendered using Apache Maven Fluido Skin 1.4
+-->
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20191205" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Oozie &#x2013; </title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" 
src="./js/apache-maven-fluido-1.4.min.js"></script>
+
+    
+                  </head>
+        <body class="topBarDisabled">
+          
+        
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="https://oozie.apache.org/"; 
id="bannerLeft">
+                                                                               
         <img src="https://oozie.apache.org/images/oozie_200x.png";  
alt="Oozie"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org/"; class="externalLink" 
title="Apache">
+        Apache</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../../" title="Oozie">
+        Oozie</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../" title="docs">
+        docs</a>
+                    <span class="divider">/</span>
+      </li>
+                <li class="">
+                    <a href="./" title="5.2.0">
+        5.2.0</a>
+                    <span class="divider">/</span>
+      </li>
+        <li class="active "></li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right"><span 
class="divider">|</span> Last Published: 2019-12-05</li>
+              <li id="projectVersion" class="pull-right">
+                    Version: 5.2.0
+        </li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span2">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+  </ul>
+                
+                    
+                
+          <hr />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/"; title="Built 
by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" 
src="./images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span10" >
+                                  
+            <p><a href="index.html">::Go back to Oozie Documentation 
Index::</a></p><hr />
+<h1>Oozie Specification, a Hadoop Workflow System</h1>
+<p>The goal of this document is to define a workflow engine system specialized 
in coordinating the execution of Hadoop Map/Reduce and Pig jobs.</p>
+<ul>
+<li><a href="#Changelog">Changelog</a></li>
+<li><a href="#a0_Definitions">0 Definitions</a></li>
+<li><a href="#a1_Specification_Highlights">1 Specification Highlights</a></li>
+<li><a href="#a2_Workflow_Definition">2 Workflow Definition</a>
+<ul>
+<li><a href="#a2.1_Cycles_in_Workflow_Definitions">2.1 Cycles in Workflow 
Definitions</a></li></ul></li>
+<li><a href="#a3_Workflow_Nodes">3 Workflow Nodes</a>
+<ul>
+<li><a href="#a3.1_Control_Flow_Nodes">3.1 Control Flow Nodes</a>
+<ul>
+<li><a href="#a3.1.1_Start_Control_Node">3.1.1 Start Control Node</a></li>
+<li><a href="#a3.1.2_End_Control_Node">3.1.2 End Control Node</a></li>
+<li><a href="#a3.1.3_Kill_Control_Node">3.1.3 Kill Control Node</a></li>
+<li><a href="#a3.1.4_Decision_Control_Node">3.1.4 Decision Control 
Node</a></li>
+<li><a href="#a3.1.5_Fork_and_Join_Control_Nodes">3.1.5 Fork and Join Control 
Nodes</a></li></ul></li>
+<li><a href="#a3.2_Workflow_Action_Nodes">3.2 Workflow Action Nodes</a>
+<ul>
+<li><a href="#a3.2.1_Action_Basis">3.2.1 Action Basis</a>
+<ul>
+<li><a href="#a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote">3.2.1.1 
Action Computation/Processing Is Always Remote</a></li>
+<li><a href="#a3.2.1.2_Actions_Are_Asynchronous">3.2.1.2 Actions Are 
Asynchronous</a></li>
+<li><a href="#a3.2.1.3_Actions_Have_2_Transitions_ok_and_error">3.2.1.3 
Actions Have 2 Transitions, ok and error</a></li>
+<li><a href="#a3.2.1.4_Action_Recovery">3.2.1.4 Action 
Recovery</a></li></ul></li>
+<li><a href="#a3.2.2_Map-Reduce_Action">3.2.2 Map-Reduce Action</a>
+<ul>
+<li><a href="#a3.2.2.1_Adding_Files_and_Archives_for_the_Job">3.2.2.1 Adding 
Files and Archives for the Job</a></li>
+<li><a 
href="#a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code">3.2.2.2 
Configuring the MapReduce action with Java code</a></li>
+<li><a href="#a3.2.2.3_Streaming">3.2.2.3 Streaming</a></li>
+<li><a href="#a3.2.2.4_Pipes">3.2.2.4 Pipes</a></li>
+<li><a href="#a3.2.2.5_Syntax">3.2.2.5 Syntax</a></li></ul></li>
+<li><a href="#a3.2.3_Pig_Action">3.2.3 Pig Action</a></li>
+<li><a href="#a3.2.4_Fs_HDFS_action">3.2.4 Fs (HDFS) action</a></li>
+<li><a href="#a3.2.5_Sub-workflow_Action">3.2.5 Sub-workflow Action</a></li>
+<li><a href="#a3.2.6_Java_Action">3.2.6 Java Action</a>
+<ul>
+<li><a href="#a3.2.6.1_Overriding_an_actions_Main_class">3.2.6.1 Overriding an 
action&#x2019;s Main class</a></li></ul></li></ul></li></ul></li>
+<li><a href="#a4_Parameterization_of_Workflows">4 Parameterization of 
Workflows</a>
+<ul>
+<li><a href="#a4.1_Workflow_Job_Properties_or_Parameters">4.1 Workflow Job 
Properties (or Parameters)</a></li>
+<li><a href="#a4.2_Expression_Language_Functions">4.2 Expression Language 
Functions</a>
+<ul>
+<li><a href="#a4.2.1_Basic_EL_Constants">4.2.1 Basic EL Constants</a></li>
+<li><a href="#a4.2.2_Basic_EL_Functions">4.2.2 Basic EL Functions</a></li>
+<li><a href="#a4.2.3_Workflow_EL_Functions">4.2.3 Workflow EL 
Functions</a></li>
+<li><a href="#a4.2.4_Hadoop_EL_Constants">4.2.4 Hadoop EL Constants</a></li>
+<li><a href="#a4.2.5_Hadoop_EL_Functions">4.2.5 Hadoop EL Functions</a></li>
+<li><a href="#a4.2.6_Hadoop_Jobs_EL_Function">4.2.6 Hadoop Jobs EL 
Function</a></li>
+<li><a href="#a4.2.7_HDFS_EL_Functions">4.2.7 HDFS EL Functions</a></li>
+<li><a href="#a4.2.8_HCatalog_EL_Functions">4.2.8 HCatalog EL 
Functions</a></li></ul></li></ul></li>
+<li><a href="#a5_Workflow_Notifications">5 Workflow Notifications</a>
+<ul>
+<li><a href="#a5.1_Workflow_Job_Status_Notification">5.1 Workflow Job Status 
Notification</a></li>
+<li><a href="#a5.2_Node_Start_and_End_Notifications">5.2 Node Start and End 
Notifications</a></li></ul></li>
+<li><a href="#a6_User_Propagation">6 User Propagation</a></li>
+<li><a href="#a7_Workflow_Application_Deployment">7 Workflow Application 
Deployment</a></li>
+<li><a href="#a8_External_Data_Assumptions">8 External Data 
Assumptions</a></li>
+<li><a href="#a9_Workflow_Jobs_Lifecycle">9 Workflow Jobs Lifecycle</a>
+<ul>
+<li><a href="#a9.1_Workflow_Job_Lifecycle">9.1 Workflow Job Lifecycle</a></li>
+<li><a href="#a9.2_Workflow_Action_Lifecycle">9.2 Workflow Action 
Lifecycle</a></li></ul></li>
+<li><a href="#a10_Workflow_Jobs_Recovery_re-run">10 Workflow Jobs Recovery 
(re-run)</a></li>
+<li><a href="#a11_Oozie_Web_Services_API">11 Oozie Web Services API</a></li>
+<li><a href="#a12_Client_API">12 Client API</a></li>
+<li><a href="#a13_Command_Line_Tools">13 Command Line Tools</a></li>
+<li><a href="#a14_Web_UI_Console">14 Web UI Console</a></li>
+<li><a href="#a15_Customizing_Oozie_with_Extensions">15 Customizing Oozie with 
Extensions</a></li>
+<li><a href="#a16_Workflow_Jobs_Priority">16 Workflow Jobs Priority</a></li>
+<li><a 
href="#a17_HDFS_Share_Libraries_for_Workflow_Applications_since_Oozie_2.3">17 
HDFS Share Libraries for Workflow Applications (since Oozie 2.3)</a>
+<ul>
+<li><a href="#a17.1_Action_Share_Library_Override_since_Oozie_3.3">17.1 Action 
Share Library Override (since Oozie 3.3)</a></li>
+<li><a href="#a17.2_Action_Share_Library_Exclude_since_Oozie_5.2">17.2 Action 
Share Library Exclude (since Oozie 5.2)</a></li></ul></li>
+<li><a href="#a18_User-Retry_for_Workflow_Actions_since_Oozie_3.1">18 
User-Retry for Workflow Actions (since Oozie 3.1)</a></li>
+<li><a href="#a19_Global_Configurations">19 Global Configurations</a></li>
+<li><a href="#a20_Suspend_On_Nodes">20 Suspend On Nodes</a></li>
+<li><a href="#Appendixes">Appendixes</a>
+<ul>
+<li><a href="#Appendix_A_Oozie_Workflow_and_Common_XML_Schemas">Appendix A, 
Oozie Workflow and Common XML Schemas</a>
+<ul>
+<li><a href="#Oozie_Workflow_Schema_Version_1.0">Oozie Workflow Schema Version 
1.0</a></li>
+<li><a href="#Oozie_Common_Schema_Version_1.0">Oozie Common Schema Version 
1.0</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.5">Oozie Workflow Schema Version 
0.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4.5">Oozie Workflow Schema 
Version 0.4.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4">Oozie Workflow Schema Version 
0.4</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.3">Oozie Workflow Schema Version 
0.3</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2.5">Oozie Workflow Schema 
Version 0.2.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2">Oozie Workflow Schema Version 
0.2</a></li>
+<li><a href="#Oozie_SLA_Version_0.2">Oozie SLA Version 0.2</a></li>
+<li><a href="#Oozie_SLA_Version_0.1">Oozie SLA Version 0.1</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.1">Oozie Workflow Schema Version 
0.1</a></li></ul></li>
+<li><a href="#Appendix_B_Workflow_Examples">Appendix B, Workflow Examples</a>
+<ul>
+<li><a href="#Fork_and_Join_Example">Fork and Join 
Example</a></li></ul></li></ul></li></ul>
+
+<div class="section">
+<h2><a name="Changelog"></a>Changelog</h2>
+<p><b>2016FEB19</b></p>
+<ul>
+
+<li>3.2.7 Updated notes on System.exit(int n) behavior</li>
+</ul>
+<p><b>2015APR29</b></p>
+<ul>
+
+<li>3.2.1.4 Added notes about Java action retries</li>
+<li>3.2.7 Added notes about Java action retries</li>
+</ul>
+<p><b>2014MAY08</b></p>
+<ul>
+
+<li>3.2.2.4 Added support for fully qualified job-xml path</li>
+</ul>
+<p><b>2013JUL03</b></p>
+<ul>
+
+<li>Appendix A, Added new workflow schema 0.5 and SLA schema 0.2</li>
+</ul>
+<p><b>2012AUG30</b></p>
+<ul>
+
+<li>4.2.2 Added two EL functions (replaceAll and appendAll)</li>
+</ul>
+<p><b>2012JUL26</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema 0.4 to include <tt>parameters</tt> 
element</li>
+<li>4.1 Updated to mention about <tt>parameters</tt> element as of schema 
0.4</li>
+</ul>
+<p><b>2012JUL23</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema 0.4 (Fs action)</li>
+<li>3.2.4 Updated to mention that a <tt>name-node</tt>, a <tt>job-xml</tt>, 
and a <tt>configuration</tt> element are allowed in the Fs action as of schema 
0.4</li>
+</ul>
+<p><b>2012JUN19</b></p>
+<ul>
+
+<li>Appendix A, added XML schema 0.4</li>
+<li>3.2.2.4 Updated to mention that multiple <tt>job-xml</tt> elements are 
allowed as of schema 0.4</li>
+<li>3.2.3 Updated to mention that multiple <tt>job-xml</tt> elements are 
allowed as of schema 0.4</li>
+</ul>
+<p><b>2011AUG17</b></p>
+<ul>
+
+<li>3.2.4 fs &#x2018;chmod&#x2019; xml closing element typo in Example 
corrected</li>
+</ul>
+<p><b>2011AUG12</b></p>
+<ul>
+
+<li>3.2.4 fs &#x2018;move&#x2019; action characteristics updated, to allow for 
consistent source and target paths and existing target path only if 
directory</li>
+<li>18, Update the doc for user-retry of workflow action.</li>
+</ul>
+<p><b>2011FEB19</b></p>
+<ul>
+
+<li>10, Update the doc to rerun from the failed node.</li>
+</ul>
+<p><b>2010OCT31</b></p>
+<ul>
+
+<li>17, Added new section on Shared Libraries</li>
+</ul>
+<p><b>2010APR27</b></p>
+<ul>
+
+<li>3.2.3 Added new &#x201c;arguments&#x201d; tag to PIG actions</li>
+<li>3.2.5 SSH actions are deprecated in Oozie schema 0.1 and removed in Oozie 
schema 0.2</li>
+<li>Appendix A, Added schema version 0.2</li>
+</ul>
+<p><b>2009OCT20</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema</li>
+</ul>
+<p><b>2009SEP15</b></p>
+<ul>
+
+<li>3.2.6 Removing support for sub-workflow in a different Oozie instance 
(removing the &#x2018;oozie&#x2019; element)</li>
+</ul>
+<p><b>2009SEP07</b></p>
+<ul>
+
+<li>3.2.2.3 Added Map Reduce Pipes specifications.</li>
+<li>3.2.2.4 Map-Reduce Examples. Previously was 3.2.2.3.</li>
+</ul>
+<p><b>2009SEP02</b></p>
+<ul>
+
+<li>10 Added missing skip nodes property name.</li>
+<li>3.2.1.4 Reworded action recovery explanation.</li>
+</ul>
+<p><b>2009AUG26</b></p>
+<ul>
+
+<li>3.2.9 Added <tt>java</tt> action type</li>
+<li>3.1.4 Example uses EL constant to refer to counter group/name</li>
+</ul>
+<p><b>2009JUN09</b></p>
+<ul>
+
+<li>12.2.4 Added build version resource to admin end-point</li>
+<li>3.2.6 Added flag to propagate workflow configuration to sub-workflows</li>
+<li>10 Added behavior for workflow job parameters given in the rerun</li>
+<li>11.3.4 workflows info returns pagination information</li>
+</ul>
+<p><b>2009MAY18</b></p>
+<ul>
+
+<li>3.1.4 decision node, &#x2018;default&#x2019; element, &#x2018;name&#x2019; 
attribute changed to &#x2018;to&#x2019;</li>
+<li>3.1.5 fork node, &#x2018;transition&#x2019; element changed to 
&#x2018;start&#x2019;, &#x2018;to&#x2019; attribute change to 
&#x2018;path&#x2019;</li>
+<li>3.1.5 join node, &#x2018;transition&#x2019; element remove, added 
&#x2018;to&#x2019; attribute to &#x2018;join&#x2019; element</li>
+<li>3.2.1.4 Rewording on action recovery section</li>
+<li>3.2.2 map-reduce action, added &#x2018;job-tracker&#x2019;, 
&#x2018;name-node&#x2019; actions, &#x2018;file&#x2019;, &#x2018;file&#x2019; 
and &#x2018;archive&#x2019; elements</li>
+<li>3.2.2.1 map-reduce action, remove from &#x2018;streaming&#x2019; element 
&#x2018;file&#x2019;, &#x2018;file&#x2019; and &#x2018;archive&#x2019; 
elements</li>
+<li>3.2.2.2 map-reduce action, reorganized streaming section</li>
+<li>3.2.3 pig action, removed information about implementation (SSH), changed 
elements names</li>
+<li>3.2.4 fs action, removed &#x2018;fs-uri&#x2019; and 
&#x2018;user-name&#x2019; elements, file system URI is now specified in path, 
user is propagated</li>
+<li>3.2.6 sub-workflow action, renamed elements &#x2018;oozie-url&#x2019; to 
&#x2018;oozie&#x2019; and &#x2018;workflow-app&#x2019; to 
&#x2018;app-path&#x2019;</li>
+<li>4 Properties that are valid Java identifiers can be used as ${NAME}</li>
+<li>4.1 Renamed default properties file from &#x2018;configuration.xml&#x2019; 
to &#x2018;default-configuration.xml&#x2019;</li>
+<li>4.2 Changes in EL Constants and Functions</li>
+<li>5 Updated notification behavior and tokens</li>
+<li>6 Changed user propagation behavior</li>
+<li>7 Changed application packaging from ZIP to HDFS directory</li>
+<li>Removed application lifecycle and self containment model sections</li>
+<li>10 Changed workflow job recovery, simplified recovery behavior</li>
+<li>11 Detailed Web Services API</li>
+<li>12 Updated  Client API section</li>
+<li>15 Updated  Action Executor API section</li>
+<li>Appendix A XML namespace updated to 
&#x2018;uri:oozie:workflow:0.1&#x2019;</li>
+<li>Appendix A Updated XML schema to changes in map-reduce/pig/fs/ssh 
actions</li>
+<li>Appendix B Updated workflow example to schema changes</li>
+</ul>
+<p><b>2009MAR25</b></p>
+<ul>
+
+<li>Changing all references of HWS to Oozie (project name)</li>
+<li>Typos, XML Formatting</li>
+<li>XML Schema URI correction</li>
+</ul>
+<p><b>2009MAR09</b></p>
+<ul>
+
+<li>Changed <tt>CREATED</tt> job state to <tt>PREP</tt> to have same states as 
Hadoop</li>
+<li>Renamed &#x2018;hadoop-workflow&#x2019; element to 
&#x2018;workflow-app&#x2019;</li>
+<li>Decision syntax changed to be &#x2018;switch/case&#x2019; with no 
transition indirection</li>
+<li>Action nodes common root element &#x2018;action&#x2019;, with the action 
type as sub-element (using a single built-in XML schema)</li>
+<li>Action nodes have 2 explicit transitions &#x2018;ok to&#x2019; and 
&#x2018;error to&#x2019; enforced by XML schema</li>
+<li>Renamed &#x2018;fail&#x2019; action element to &#x2018;kill&#x2019;</li>
+<li>Renamed &#x2018;hadoop&#x2019; action element to 
&#x2018;map-reduce&#x2019;</li>
+<li>Renamed &#x2018;hdfs&#x2019; action element to &#x2018;fs&#x2019;</li>
+<li>Updated all XML snippets and examples</li>
+<li>Made user propagation simpler and consistent</li>
+<li>Added Oozie XML schema to Appendix A</li>
+<li>Added workflow example to Appendix B</li>
+</ul>
+<p><b>2009FEB22</b></p>
+<ul>
+
+<li>Opened <a class="externalLink" 
href="https://issues.apache.org/jira/browse/HADOOP-5303";>JIRA 
HADOOP-5303</a></li>
+</ul>
+<p><b>27/DEC/2012:</b></p>
+<ul>
+
+<li>Added information on dropping hcatalog table partitions in prepare 
block</li>
+<li>Added hcatalog EL functions section</li>
+</ul></div>
+<div class="section">
+<h2><a name="a0_Definitions"></a>0 Definitions</h2>
+<p><b>Action:</b> An execution/computation task (Map-Reduce job, Pig job, a 
shell command). It can also be referred as task or &#x2018;action 
node&#x2019;.</p>
+<p><b>Workflow:</b> A collection of actions arranged in a control dependency 
DAG (Directed Acyclic Graph). &#x201c;control dependency&#x201d; from one 
action to another means that the second action can&#x2019;t run until the first 
action has completed.</p>
+<p><b>Workflow Definition:</b> A programmatic description of a workflow that 
can be executed.</p>
+<p><b>Workflow Definition Language:</b> The language used to define a Workflow 
Definition.</p>
+<p><b>Workflow Job:</b> An executable instance of a workflow definition.</p>
+<p><b>Workflow Engine:</b> A system that executes workflows jobs. It can also 
be referred as a DAG engine.</p></div>
+<div class="section">
+<h2><a name="a1_Specification_Highlights"></a>1 Specification Highlights</h2>
+<p>A Workflow application is DAG that coordinates the following types of 
actions: Hadoop, Pig, and sub-workflows.</p>
+<p>Flow control operations within the workflow applications can be done using 
decision, fork and join nodes. Cycles in workflows are not supported.</p>
+<p>Actions and decisions can be parameterized with job properties, actions 
output (i.e. Hadoop counters) and file information (file exists, file size, 
etc). Formal parameters are expressed in the workflow definition as 
<tt>${VAR}</tt> variables.</p>
+<p>A Workflow application is a ZIP file that contains the workflow definition 
(an XML file), all the necessary files to run all the actions: JAR files for 
Map/Reduce jobs, shells for streaming Map/Reduce jobs, native libraries, Pig 
scripts, and other resource files.</p>
+<p>Before running a workflow job, the corresponding workflow application must 
be deployed in Oozie.</p>
+<p>Deploying workflow application and running workflow jobs can be done via 
command line tools, a WS API and a Java API.</p>
+<p>Monitoring the system and workflow jobs can be done via a web console, 
command line tools, a WS API and a Java API.</p>
+<p>When submitting a workflow job, a set of properties resolving all the 
formal parameters in the workflow definitions must be provided. This set of 
properties is a Hadoop configuration.</p>
+<p>Possible states for a workflow jobs are: <tt>PREP</tt>, <tt>RUNNING</tt>, 
<tt>SUSPENDED</tt>, <tt>SUCCEEDED</tt>, <tt>KILLED</tt> and <tt>FAILED</tt>.</p>
+<p>In the case of a action start failure in a workflow job, depending on the 
type of failure, Oozie will attempt automatic retries, it will request a manual 
retry or it will fail the workflow job.</p>
+<p>Oozie can make HTTP callback notifications on action start/end/failure 
events and workflow end/failure events.</p>
+<p>In the case of workflow job failure, the workflow job can be resubmitted 
skipping previously completed actions. Before doing a resubmission the workflow 
application could be updated with a patch to fix a problem in the workflow 
application code.</p>
+<p><a name="WorkflowDefinition"></a></p></div>
+<div class="section">
+<h2><a name="a2_Workflow_Definition"></a>2 Workflow Definition</h2>
+<p>A workflow definition is a DAG with control flow nodes (start, end, 
decision, fork, join, kill) or action nodes (map-reduce, pig, etc.), nodes are 
connected by transitions arrows.</p>
+<p>The workflow definition language is XML based and it is called hPDL (Hadoop 
Process Definition Language).</p>
+<p>Refer to the Appendix A for the<a 
href="WorkflowFunctionalSpec.html#OozieWFSchema">Oozie Workflow Definition XML 
Schema</a>. Appendix B has <a 
href="WorkflowFunctionalSpec.html#OozieWFExamples">Workflow Definition 
Examples</a>.</p>
+<div class="section">
+<h3><a name="a2.1_Cycles_in_Workflow_Definitions"></a>2.1 Cycles in Workflow 
Definitions</h3>
+<p>Oozie does not support cycles in workflow definitions, workflow definitions 
must be a strict DAG.</p>
+<p>At workflow application deployment time, if Oozie detects a cycle in the 
workflow definition it must fail the deployment.</p></div></div>
+<div class="section">
+<h2><a name="a3_Workflow_Nodes"></a>3 Workflow Nodes</h2>
+<p>Workflow nodes are classified in control flow nodes and action nodes:</p>
+<ul>
+
+<li><b>Control flow nodes:</b> nodes that control the start and end of the 
workflow and workflow job execution path.</li>
+<li><b>Action nodes:</b> nodes that trigger the execution of a 
computation/processing task.</li>
+</ul>
+<p>Node names and transitions must be conform to the following pattern 
<tt>[a-zA-Z][\-_a-zA-Z0-0]*</tt>, of up to 20 characters long.</p>
+<div class="section">
+<h3><a name="a3.1_Control_Flow_Nodes"></a>3.1 Control Flow Nodes</h3>
+<p>Control flow nodes define the beginning and the end of a workflow (the 
<tt>start</tt>, <tt>end</tt> and <tt>kill</tt> nodes) and provide a mechanism 
to control the workflow execution path (the <tt>decision</tt>, <tt>fork</tt> 
and <tt>join</tt> nodes).</p>
+<p><a name="StartNode"></a></p>
+<div class="section">
+<h4><a name="a3.1.1_Start_Control_Node"></a>3.1.1 Start Control Node</h4>
+<p>The <tt>start</tt> node is the entry point for a workflow job, it indicates 
the first workflow node the workflow job must transition to.</p>
+<p>When a workflow is started, it automatically transitions to the node 
specified in the <tt>start</tt>.</p>
+<p>A workflow definition must have one <tt>start</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+  ...
+  &lt;start to=&quot;[NODE-NAME]&quot;/&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>to</tt> attribute is the name of first workflow node to execute.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;start to=&quot;firstHadoopJob&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="EndNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.2_End_Control_Node"></a>3.1.2 End Control Node</h4>
+<p>The <tt>end</tt> node is the end for a workflow job, it indicates that the 
workflow job has completed successfully.</p>
+<p>When a workflow job reaches the <tt>end</tt> it finishes successfully 
(SUCCEEDED).</p>
+<p>If one or more actions started by the workflow job are executing when the 
<tt>end</tt> node is reached, the actions will be killed. In this scenario the 
workflow job is still considered as successfully run.</p>
+<p>A workflow definition must have one <tt>end</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;[NODE-NAME]&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute is the name of the transition to do to end the 
workflow job.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;end&quot;/&gt;
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="KillNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.3_Kill_Control_Node"></a>3.1.3 Kill Control Node</h4>
+<p>The <tt>kill</tt> node allows a workflow job to kill itself.</p>
+<p>When a workflow job reaches the <tt>kill</tt> it finishes in error 
(KILLED).</p>
+<p>If one or more actions started by the workflow job are executing when the 
<tt>kill</tt> node is reached, the actions will be killed.</p>
+<p>A workflow definition may have zero or more <tt>kill</tt> nodes.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;message&gt;[MESSAGE-TO-LOG]&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>kill</tt> node is the name of the 
Kill action node.</p>
+<p>The content of the <tt>message</tt> element will be logged as the kill 
reason for the workflow job.</p>
+<p>A <tt>kill</tt> node does not have transition elements because it ends the 
workflow job, as <tt>KILLED</tt>.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;killBecauseNoInput&quot;&gt;
+        &lt;message&gt;Input unavailable&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="DecisionNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.4_Decision_Control_Node"></a>3.1.4 Decision Control Node</h4>
+<p>A <tt>decision</tt> node enables a workflow to make a selection on the 
execution path to follow.</p>
+<p>The behavior of a <tt>decision</tt> node can be seen as a switch-case 
statement.</p>
+<p>A <tt>decision</tt> node consists of a list of predicates-transition pairs 
plus a default transition. Predicates are evaluated in order or appearance 
until one of them evaluates to <tt>true</tt> and the corresponding transition 
is taken. If none of the predicates evaluates to <tt>true</tt> the 
<tt>default</tt> transition is taken.</p>
+<p>Predicates are JSP Expression Language (EL) expressions (refer to section 
4.2 of this document) that resolve into a boolean value, <tt>true</tt> or 
<tt>false</tt>. For example:</p>
+
+<div>
+<div>
+<pre class="source">    ${fs:fileSize('/usr/foo/myinputdir') gt 10 * GB}
+</pre></div></div>
+
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            ...
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            &lt;default to=&quot;[NODE_NAME]&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>decision</tt> node is the name of 
the decision node.</p>
+<p>Each <tt>case</tt> elements contains a predicate and a transition name. The 
predicate ELs are evaluated in order until one returns <tt>true</tt> and the 
corresponding transition is taken.</p>
+<p>The <tt>default</tt> element indicates the transition to take if none of 
the predicates evaluates to <tt>true</tt>.</p>
+<p>All decision nodes must have a <tt>default</tt> element to avoid bringing 
the workflow into an error state if none of the predicates evaluates to 
true.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;mydecision&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;reconsolidatejob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) gt 10 * GB}
+            &lt;/case&gt; &lt;case to=&quot;rexpandjob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) lt 100 * MB}
+            &lt;/case&gt;
+            &lt;case to=&quot;recomputejob&quot;&gt;
+              ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
+            &lt;/case&gt;
+            &lt;default to=&quot;end&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="ForkJoinNodes"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.5_Fork_and_Join_Control_Nodes"></a>3.1.5 Fork and Join 
Control Nodes</h4>
+<p>A <tt>fork</tt> node splits one path of execution into multiple concurrent 
paths of execution.</p>
+<p>A <tt>join</tt> node waits until every concurrent execution path of a 
previous <tt>fork</tt> node arrives to it.</p>
+<p>The <tt>fork</tt> and <tt>join</tt> nodes must be used in pairs. The 
<tt>join</tt> node assumes concurrent execution paths are children of the same 
<tt>fork</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;[FORK-NODE-NAME]&quot;&gt;
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+        ...
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+    &lt;/fork&gt;
+    ...
+    &lt;join name=&quot;[JOIN-NODE-NAME]&quot; to=&quot;[NODE-NAME]&quot; /&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>fork</tt> node is the name of the 
workflow fork node. The <tt>start</tt> attribute in the <tt>path</tt> elements 
in the <tt>fork</tt> node indicate the name of the workflow node that will be 
part of the concurrent execution paths.</p>
+<p>The <tt>name</tt> attribute in the <tt>join</tt> node is the name of the 
workflow join node. The <tt>to</tt> attribute in the <tt>join</tt> node 
indicates the name of the workflow node that will executed after all concurrent 
execution paths of the corresponding fork arrive to the join node.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;forking&quot;&gt;
+        &lt;path start=&quot;firstparalleljob&quot;/&gt;
+        &lt;path start=&quot;secondparalleljob&quot;/&gt;
+    &lt;/fork&gt;
+    &lt;action name=&quot;firstparallejob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job1.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;action name=&quot;secondparalleljob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job2.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;join name=&quot;joining&quot; to=&quot;nextaction&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>By default, Oozie performs some validation that any forking in a workflow 
is valid and won&#x2019;t lead to any incorrect behavior or instability.  
However, if Oozie is preventing a workflow from being submitted and you are 
very certain that it should work, you can disable forkjoin validation so that 
Oozie will accept the workflow.  To disable this validation just for a specific 
workflow, simply set <tt>oozie.wf.validate.ForkJoin</tt> to <tt>false</tt> in 
the job.properties file.  To disable this validation for all workflows, simply 
set <tt>oozie.validate.ForkJoin</tt> to <tt>false</tt> in the oozie-site.xml 
file.  Disabling this validation is determined by the AND of both of these 
properties, so it will be disabled if either or both are set to false and only 
enabled if both are set to true (or not specified).</p>
+<p><a name="ActionNodes"></a></p></div></div>
+<div class="section">
+<h3><a name="a3.2_Workflow_Action_Nodes"></a>3.2 Workflow Action Nodes</h3>
+<p>Action nodes are the mechanism by which a workflow triggers the execution 
of a computation/processing task.</p>
+<div class="section">
+<h4><a name="a3.2.1_Action_Basis"></a>3.2.1 Action Basis</h4>
+<p>The following sub-sections define common behavior and capabilities for all 
action types.</p>
+<div class="section">
+<h5><a 
name="a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote"></a>3.2.1.1 
Action Computation/Processing Is Always Remote</h5>
+<p>All computation/processing tasks triggered by an action node are remote to 
Oozie. No workflow application specific computation/processing task is executed 
within Oozie.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.2_Actions_Are_Asynchronous"></a>3.2.1.2 Actions Are 
Asynchronous</h5>
+<p>All computation/processing tasks triggered by an action node are executed 
asynchronously by Oozie. For most types of computation/processing tasks 
triggered by workflow action, the workflow job has to wait until the 
computation/processing task completes before transitioning to the following 
node in the workflow.</p>
+<p>The exception is the <tt>fs</tt> action that is handled as a synchronous 
action.</p>
+<p>Oozie can detect completion of computation/processing tasks by two 
different means, callbacks and polling.</p>
+<p>When a computation/processing tasks is started by Oozie, Oozie provides a 
unique callback URL to the task, the task should invoke the given URL to notify 
its completion.</p>
+<p>For cases that the task failed to invoke the callback URL for any reason 
(i.e. a transient network failure) or when the type of task cannot invoke the 
callback URL upon completion, Oozie has a mechanism to poll 
computation/processing tasks for completion.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.3_Actions_Have_2_Transitions_ok_and_error"></a>3.2.1.3 
Actions Have 2 Transitions, <tt>ok</tt> and <tt>error</tt></h5>
+<p>If a computation/processing task -triggered by a workflow- completes 
successfully, it transitions to <tt>ok</tt>.</p>
+<p>If a computation/processing task -triggered by a workflow- fails to 
complete successfully, its transitions to <tt>error</tt>.</p>
+<p>If a computation/processing task exits in error, there 
computation/processing task must provide <tt>error-code</tt> and 
<tt>error-message</tt> information to Oozie. This information can be used from 
<tt>decision</tt> nodes to implement a fine grain error handling at workflow 
application level.</p>
+<p>Each action type must clearly define all the error codes it can 
produce.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.4_Action_Recovery"></a>3.2.1.4 Action Recovery</h5>
+<p>Oozie provides recovery capabilities when starting or ending actions.</p>
+<p>Once an action starts successfully Oozie will not retry starting the action 
if the action fails during its execution. The assumption is that the external 
system (i.e. Hadoop) executing the action has enough resilience to recover jobs 
once it has started (i.e. Hadoop task retries).</p>
+<p>Java actions are a special case with regard to retries.  Although Oozie 
itself does not retry Java actions should they fail after they have 
successfully started, Hadoop itself can cause the action to be restarted due to 
a map task retry on the map task running the Java application.  See the Java 
Action section below for more detail.</p>
+<p>For failures that occur prior to the start of the job, Oozie will have 
different recovery strategies depending on the nature of the failure.</p>
+<p>If the failure is of transient nature, Oozie will perform retries after a 
pre-defined time interval. The number of retries and timer interval for a type 
of action must be pre-configured at Oozie level. Workflow jobs can override 
such configuration.</p>
+<p>Examples of a transient failures are network problems or a remote system 
temporary unavailable.</p>
+<p>If the failure is of non-transient nature, Oozie will suspend the workflow 
job until an manual or programmatic intervention resumes the workflow job and 
the action start or end is retried. It is the responsibility of an 
administrator or an external managing system to perform any necessary cleanup 
before resuming the workflow job.</p>
+<p>If the failure is an error and a retry will not resolve the problem, Oozie 
will perform the error transition for the action.</p>
+<p><a name="MapReduceAction"></a></p></div></div>
+<div class="section">
+<h4><a name="a3.2.2_Map-Reduce_Action"></a>3.2.2 Map-Reduce Action</h4>
+<p>The <tt>map-reduce</tt> action starts a Hadoop map/reduce job from a 
workflow. Hadoop jobs can be Java Map/Reduce jobs or streaming jobs.</p>
+<p>A <tt>map-reduce</tt> action can be configured to perform file system 
cleanup and directory creation before starting the map reduce job. This 
capability enables Oozie to retry a Hadoop job in the situation of a transient 
failure (Hadoop checks the non-existence of the job output directory and then 
creates it when the Hadoop job is starting, thus a retry without cleanup of the 
job output directory would fail).</p>
+<p>The workflow job will wait until the Hadoop map/reduce job completes before 
continuing to the next action in the workflow execution path.</p>
+<p>The counters of the Hadoop job and job exit status (<tt>FAILED</tt>, 
<tt>KILLED</tt> or <tt>SUCCEEDED</tt>) must be available to the workflow job 
after the Hadoop jobs ends. This information can be used from within decision 
nodes and other actions configurations.</p>
+<p>The <tt>map-reduce</tt> action has to be configured with all the necessary 
Hadoop JobConf properties to run the Hadoop map/reduce job.</p>
+<p>Hadoop JobConf properties can be specified as part of</p>
+<ul>
+
+<li>the <tt>config-default.xml</tt> or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li>&lt;global&gt; tag in workflow definition or</li>
+<li>Inline <tt>map-reduce</tt> action configuration or</li>
+<li>An implementation of OozieActionConfigurator specified by the 
&lt;config-class&gt; tag in workflow definition.</li>
+</ul>
+<p>The configuration properties are loaded in the following above order i.e. 
<tt>streaming</tt>, <tt>job-xml</tt>, <tt>configuration</tt>, and 
<tt>config-class</tt>, and the precedence order is later values override 
earlier values.</p>
+<p>Streaming and inline property values can be parameterized (templatized) 
using EL expressions.</p>
+<p>The Hadoop <tt>mapred.job.tracker</tt> and <tt>fs.default.name</tt> 
properties must not be present in the job-xml and inline configuration.</p>
+<p><a name="FilesArchives"></a></p>
+<div class="section">
+<h5><a name="a3.2.2.1_Adding_Files_and_Archives_for_the_Job"></a>3.2.2.1 
Adding Files and Archives for the Job</h5>
+<p>The <tt>file</tt>, <tt>archive</tt> elements make available, to map-reduce 
jobs, files and archives. If the specified path is relative, it is assumed the 
file or archiver are within the application directory, in the corresponding 
sub-path. If the path is absolute, the file or archive it is expected in the 
given absolute path.</p>
+<p>Files specified with the <tt>file</tt> element, will be symbolic links in 
the home directory of the task.</p>
+<p>If a file is a native library (an &#x2018;.so&#x2019; or a 
&#x2018;.so.#&#x2019; file), it will be symlinked as and &#x2018;.so&#x2019; 
file in the task running directory, thus available to the task JVM.</p>
+<p>To force a symlink for a file on the task running directory, use a 
&#x2018;#&#x2019; followed by the symlink name. For example 
&#x2018;mycat.sh#cat&#x2019;.</p>
+<p>Refer to Hadoop distributed cache documentation for details more details on 
files and archives.</p></div>
+<div class="section">
+<h5><a 
name="a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code"></a>3.2.2.2 
Configuring the MapReduce action with Java code</h5>
+<p>Java code can be used to further configure the MapReduce action.  This can 
be useful if you already have &#x201c;driver&#x201d; code for your MapReduce 
action, if you&#x2019;re more familiar with MapReduce&#x2019;s Java API, if 
there&#x2019;s some configuration that requires logic, or some configuration 
that&#x2019;s difficult to do in straight XML (e.g. Avro).</p>
+<p>Create a class that implements the 
org.apache.oozie.action.hadoop.OozieActionConfigurator interface from the 
&#x201c;oozie-sharelib-oozie&#x201d; artifact.  It contains a single method 
that receives a <tt>JobConf</tt> as an argument.  Any configuration properties 
set on this <tt>JobConf</tt> will be used by the MapReduce action.</p>
+<p>The OozieActionConfigurator has this signature:</p>
+
+<div>
+<div>
+<pre class="source">public interface OozieActionConfigurator {
+    public void configure(JobConf actionConf) throws 
OozieActionConfiguratorException;
+}
+</pre></div></div>
+
+<p>where <tt>actionConf</tt> is the <tt>JobConf</tt> you can update.  If you 
need to throw an Exception, you can wrap it in an 
<tt>OozieActionConfiguratorException</tt>, also in the 
&#x201c;oozie-sharelib-oozie&#x201d; artifact.</p>
+<p>For example:</p>
+
+<div>
+<div>
+<pre class="source">package com.example;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileInputFormat;
+import org.apache.hadoop.mapred.FileOutputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.oozie.action.hadoop.OozieActionConfigurator;
+import org.apache.oozie.action.hadoop.OozieActionConfiguratorException;
+import org.apache.oozie.example.SampleMapper;
+import org.apache.oozie.example.SampleReducer;
+
+public class MyConfigClass implements OozieActionConfigurator {
+
+    @Override
+    public void configure(JobConf actionConf) throws 
OozieActionConfiguratorException {
+        if (actionConf.getUser() == null) {
+            throw new OozieActionConfiguratorException(&quot;No user 
set&quot;);
+        }
+        actionConf.setMapperClass(SampleMapper.class);
+        actionConf.setReducerClass(SampleReducer.class);
+        FileInputFormat.setInputPaths(actionConf, new Path(&quot;/user/&quot; 
+ actionConf.getUser() + &quot;/input-data&quot;));
+        FileOutputFormat.setOutputPath(actionConf, new Path(&quot;/user/&quot; 
+ actionConf.getUser() + &quot;/output&quot;));
+        ...
+    }
+}
+</pre></div></div>
+
+<p>To use your config class in your MapReduce action, simply compile it into a 
jar, make the jar available to your action, and specify the class name in the 
<tt>config-class</tt> element (this requires at least schema 0.5):</p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            ...
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            ...
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>Another example of this can be found in the &#x201c;map-reduce&#x201d; 
example that comes with Oozie.</p>
+<p>A useful tip: The initial <tt>JobConf</tt> passed to the <tt>configure</tt> 
method includes all of the properties listed in the <tt>configuration</tt> 
section of the MR action in a workflow.  If you need to pass any information to 
your OozieActionConfigurator, you can simply put them here.</p>
+<p><a name="StreamingMapReduceAction"></a></p></div>
+<div class="section">
+<h5><a name="a3.2.2.3_Streaming"></a>3.2.2.3 Streaming</h5>
+<p>Streaming information can be specified in the <tt>streaming</tt> 
element.</p>
+<p>The <tt>mapper</tt> and <tt>reducer</tt> elements are used to specify the 
executable/script to be used as mapper and reducer.</p>
+<p>User defined scripts must be bundled with the workflow application and they 
must be declared in the <tt>files</tt> element of the streaming configuration. 
If the are not declared in the <tt>files</tt> element of the configuration it 
is assumed they will be available (and in the command PATH) of the Hadoop slave 
machines.</p>
+<p>Some streaming jobs require Files found on HDFS to be available to the 
mapper/reducer scripts. This is done using the <tt>file</tt> and 
<tt>archive</tt> elements described in the previous section.</p>
+<p>The Mapper/Reducer can be overridden by a <tt>mapred.mapper.class</tt> or 
<tt>mapred.reducer.class</tt> properties in the <tt>job-xml</tt> file or 
<tt>configuration</tt> elements.</p>
+<p><a name="PipesMapReduceAction"></a></p></div>
+<div class="section">
+<h5><a name="a3.2.2.4_Pipes"></a>3.2.2.4 Pipes</h5>
+<p>Pipes information can be specified in the <tt>pipes</tt> element.</p>
+<p>A subset of the command line options which can be used while using the 
Hadoop Pipes Submitter can be specified via elements - <tt>map</tt>, 
<tt>reduce</tt>, <tt>inputformat</tt>, <tt>partitioner</tt>, <tt>writer</tt>, 
<tt>program</tt>.</p>
+<p>The <tt>program</tt> element is used to specify the executable/script to be 
used.</p>
+<p>User defined program must be bundled with the workflow application.</p>
+<p>Some pipe jobs require Files found on HDFS to be available to the 
mapper/reducer scripts. This is done using the <tt>file</tt> and 
<tt>archive</tt> elements described in the previous section.</p>
+<p>Pipe properties can be overridden by specifying them in the 
<tt>job-xml</tt> file or <tt>configuration</tt> element.</p></div>
+<div class="section">
+<h5><a name="a3.2.2.5_Syntax"></a>3.2.2.5 Syntax</h5>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;[PATH]&quot;/&gt;
+                ...
+                &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+                ...
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;[MAPPER-PROCESS]&lt;/mapper&gt;
+                &lt;reducer&gt;[REDUCER-PROCESS]&lt;/reducer&gt;
+                
&lt;record-reader&gt;[RECORD-READER-CLASS]&lt;/record-reader&gt;
+                
&lt;record-reader-mapping&gt;[NAME=VALUE]&lt;/record-reader-mapping&gt;
+                ...
+                &lt;env&gt;[NAME=VALUE]&lt;/env&gt;
+                ...
+            &lt;/streaming&gt;
+                       &lt;!-- Either streaming or pipes can be specified for 
an action, not both --&gt;
+            &lt;pipes&gt;
+                &lt;map&gt;[MAPPER]&lt;/map&gt;
+                &lt;reduce&gt;[REDUCER]&lt;/reducer&gt;
+                &lt;inputformat&gt;[INPUTFORMAT]&lt;/inputformat&gt;
+                &lt;partitioner&gt;[PARTITIONER]&lt;/partitioner&gt;
+                &lt;writer&gt;[OUTPUTFORMAT]&lt;/writer&gt;
+                &lt;program&gt;[EXECUTABLE]&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/map-reduce&gt;
+
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to 
delete before starting the job. This should be used exclusively for directory 
cleanup or dropping of hcatalog table or table partitions for the job to be 
executed. The delete operation will be performed in the 
<tt>fs.default.name</tt> filesystem for hdfs URIs. The format for specifying 
hcatalog table URI is hcat://[metastore server]:[port]/[database name]/[table 
name] and format to specify a hcatalog table partition URI is 
<tt>hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, 
the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog 
and hive jars need to be placed in workflow lib directory or specified using 
<tt>archive</tt> tag.</p>
+<p>The <tt>job-xml</tt> element, if present, must refer to a Hadoop JobConf 
<tt>job.xml</tt> file bundled in the workflow application. By default the 
<tt>job.xml</tt> file is taken from the workflow application namenode, 
regardless the namenode specified for the action. To specify a <tt>job.xml</tt> 
on another namenode use a fully qualified file path. The <tt>job-xml</tt> 
element is optional and as of schema 0.4, multiple <tt>job-xml</tt> elements 
are allowed in order to specify multiple Hadoop JobConf <tt>job.xml</tt> 
files.</p>
+<p>The <tt>configuration</tt> element, if present, contains JobConf properties 
for the Hadoop job.</p>
+<p>Properties specified in the <tt>configuration</tt> element override 
properties specified in the file specified in the <tt>job-xml</tt> element.</p>
+<p>As of schema 0.5, the <tt>config-class</tt> element, if present, contains a 
class that implements OozieActionConfigurator that can be used to further 
configure the MapReduce job.</p>
+<p>Properties specified in the <tt>config-class</tt> class override properties 
specified in <tt>configuration</tt> element.</p>
+<p>External Stats can be turned on/off by specifying the property 
<i>oozie.action.external.stats.write</i> as <i>true</i> or <i>false</i> in the 
configuration element of workflow.xml. The default value for this property is 
<i>false</i>.</p>
+<p>The <tt>file</tt> element, if present, must specify the target symbolic 
link for binaries by separating the original file and target with a # 
(file#target-sym-link). This is not required for libraries.</p>
+<p>The <tt>mapper</tt> and <tt>reducer</tt> process for streaming jobs, should 
specify the executable command with URL encoding. e.g. &#x2018;%&#x2019; should 
be replaced by &#x2018;%25&#x2019;.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstHadoopJob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete 
path=&quot;hdfs://foo:8020/usr/tucu/output-data&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;/myfirstjob.xml&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.reduce.tasks&lt;/name&gt;
+                    &lt;value&gt;${firstJobReducers}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;myNextAction&quot;/&gt;
+        &lt;error to=&quot;errorCleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, the number of Reducers to be used by the Map/Reduce 
job has to be specified as a parameter of the workflow job configuration when 
creating the workflow job.</p>
+<p><b>Streaming Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;/bin/bash testarchive/bin/mapper.sh 
testfile&lt;/mapper&gt;
+                &lt;reducer&gt;/bin/bash 
testarchive/bin/reducer.sh&lt;/reducer&gt;
+            &lt;/streaming&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;stream.num.map.output.key.fields&lt;/name&gt;
+                    &lt;value&gt;3&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;file&gt;/users/blabla/testfile.sh#testfile&lt;/file&gt;
+            
&lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Pipes Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;pipes&gt;
+                
&lt;program&gt;bin/wordcount-simple#wordcount-simple&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            
&lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="PigAction"></a></p></div></div>
+<div class="section">
+<h4><a name="a3.2.3_Pig_Action"></a>3.2.3 Pig Action</h4>
+<p>The <tt>pig</tt> action starts a Pig job.</p>
+<p>The workflow job will wait until the pig job completes before continuing to 
the next action.</p>
+<p>The <tt>pig</tt> action has to be configured with the resource-manager, 
name-node, pig script and the necessary parameters and configuration to run the 
Pig job.</p>
+<p>A <tt>pig</tt> action can be configured to perform HDFS files/directories 
cleanup or HCatalog partitions cleanup before starting the Pig job. This 
capability enables Oozie to retry a Pig job in the situation of a transient 
failure (Pig creates temporary directories for intermediate data, thus a retry 
without cleanup would fail).</p>
+<p>Hadoop JobConf properties can be specified as part of</p>
+<ul>
+
+<li>the <tt>config-default.xml</tt> or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li>&lt;global&gt; tag in workflow definition or</li>
+<li>Inline <tt>pig</tt> action configuration.</li>
+</ul>
+<p>The configuration properties are loaded in the following above order i.e. 
<tt>job-xml</tt> and <tt>configuration</tt>, and the precedence order is later 
values override earlier values.</p>
+<p>Inline property values can be parameterized (templatized) using EL 
expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt> and HDFS 
<tt>fs.default.name</tt> properties must not be present in the job-xml and 
inline configuration.</p>
+<p>As with Hadoop map-reduce jobs, it  is possible to add files and archives 
to be available to the Pig job, refer to section [#FilesArchives][Adding Files 
and Archives for the Job].</p>
+<p><b>Syntax for Pig actions in Oozie schema 1.0:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Syntax for Pig actions in Oozie schema 0.2:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Syntax for Pig actions in Oozie schema 0.1:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to 
delete before starting the job. This should be used exclusively for directory 
cleanup or dropping of hcatalog table or table partitions for the job to be 
executed. The delete operation will be performed in the 
<tt>fs.default.name</tt> filesystem for hdfs URIs. The format for specifying 
hcatalog table URI is hcat://[metastore server]:[port]/[database name]/[table 
name] and format to specify a hcatalog table partition URI is 
<tt>hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, 
the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog 
and hive jars need to be placed in workflow lib directory or specified using 
<tt>archive</tt> tag.</p>
+<p>The <tt>job-xml</tt> element, if present, must refer to a Hadoop JobConf 
<tt>job.xml</tt> file bundled in the workflow application. The <tt>job-xml</tt> 
element is optional and as of schema 0.4, multiple <tt>job-xml</tt> elements 
are allowed in order to specify multiple Hadoop JobConf <tt>job.xml</tt> 
files.</p>
+<p>The <tt>configuration</tt> element, if present, contains JobConf properties 
for the underlying Hadoop jobs.</p>
+<p>Properties specified in the <tt>configuration</tt> element override 
properties specified in the file specified in the <tt>job-xml</tt> element.</p>
+<p>External Stats can be turned on/off by specifying the property 
<i>oozie.action.external.stats.write</i> as <i>true</i> or <i>false</i> in the 
configuration element of workflow.xml. The default value for this property is 
<i>false</i>.</p>
+<p>The inline and job-xml configuration properties are passed to the Hadoop 
jobs submitted by Pig runtime.</p>
+<p>The <tt>script</tt> element contains the pig script to execute. The pig 
script can be templatized with variables of the form <tt>${VARIABLE}</tt>. The 
values of these variables can then be specified using the <tt>params</tt> 
element.</p>
+<p>NOTE: Oozie will perform the parameter substitution before firing the pig 
job. This is different from the <a class="externalLink" 
href="http://wiki.apache.org/pig/ParameterSubstitution";>parameter substitution 
mechanism provided by Pig</a>, which has a few limitations.</p>
+<p>The <tt>params</tt> element, if present, contains parameters to be passed 
to the pig script.</p>
+<p><b>In Oozie schema 0.2:</b> The <tt>arguments</tt> element, if present, 
contains arguments to be passed to the pig script.</p>
+<p>All the above elements can be parameterized (templatized) using EL 
expressions.</p>
+<p><b>Example for Oozie schema 0.2:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;INPUT=${inputDir}&lt;/argument&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;OUTPUT=${outputDir}/pig-output3&lt;/argument&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Example for Oozie schema 0.1:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;param&gt;InputDir=/home/tucu/input-data&lt;/param&gt;
+            &lt;param&gt;OutputDir=${jobOutput}&lt;/param&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="FsAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.4_Fs_HDFS_action"></a>3.2.4 Fs (HDFS) action</h4>
+<p>The <tt>fs</tt> action allows to manipulate files and directories in HDFS 
from a workflow application. The supported commands are <tt>move</tt>, 
<tt>delete</tt>, <tt>mkdir</tt>, <tt>chmod</tt>, <tt>touchz</tt>, 
<tt>setrep</tt> and <tt>chgrp</tt>.</p>
+<p>The FS commands are executed synchronously from within the FS action, the 
workflow job will wait until the specified file commands are completed before 
continuing to the next action.</p>
+<p>Path names specified in the <tt>fs</tt> action can be parameterized 
(templatized) using EL expressions. Path name should be specified as a absolute 
path. In case of <tt>move</tt>, <tt>delete</tt>, <tt>chmod</tt> and 
<tt>chgrp</tt> commands, a glob pattern can also be specified instead of an 
absolute path. For <tt>move</tt>, glob pattern can only be specified for source 
path and not the target.</p>
+<p>Each file path must specify the file system URI, for move operations, the 
target must not specify the system URI.</p>
+<p><b>IMPORTANT:</b> For the purposes of copying files within a cluster it is 
recommended to refer to the <tt>distcp</tt> action instead. Refer to <a 
href="DG_DistCpActionExtension.html"><tt>distcp</tt></a> action to copy files 
within a cluster.</p>
+<p><b>IMPORTANT:</b> All the commands within <tt>fs</tt> action do not happen 
atomically, if a <tt>fs</tt> action fails half way in the commands being 
executed, successfully executed commands are not rolled back. The <tt>fs</tt> 
action, before executing any command must check that source paths exist and 
target paths don&#x2019;t exist (constraint regarding target relaxed for the 
<tt>move</tt> action. See below for details), thus failing before executing any 
command. Therefore the validity of all paths specified in one <tt>fs</tt> 
action are evaluated before any of the file operation are executed. Thus there 
is less chance of an error occurring while the <tt>fs</tt> action executes.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;fs&gt;
+            &lt;delete path='[PATH]' skip-trash='[true/false]'/&gt;
+            ...
+            &lt;mkdir path='[PATH]'/&gt;
+            ...
+            &lt;move source='[SOURCE-PATH]' target='[TARGET-PATH]'/&gt;
+            ...
+            &lt;chmod path='[PATH]' permissions='[PERMISSIONS]' 
dir-files='false' /&gt;
+            ...
+            &lt;touchz path='[PATH]' /&gt;
+            ...
+            &lt;chgrp path='[PATH]' group='[GROUP]' dir-files='false' /&gt;
+            ...
+            &lt;setrep path='[PATH]' replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>delete</tt> command deletes the specified path, if it is a 
directory it deletes recursively all its content and then deletes the 
directory. By default it does skip trash. It can be moved to trash by setting 
the value of skip-trash to &#x2018;false&#x2019;. It can also be used to drop 
hcat tables/partitions. This is the only FS command which supports HCatalog 
URIs as well. For eg:</p>
+
+<div>
+<div>
+<pre class="source">&lt;delete path='hcat://[metastore 
server]:[port]/[database name]/[table name]'/&gt;
+OR
+&lt;delete path='hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value];...'/&gt;
+</pre></div></div>
+
+<p>The <tt>mkdir</tt> command creates the specified directory, it creates all 
missing directories in the path. If the directory already exist it does a 
no-op.</p>
+<p>In the <tt>move</tt> command the <tt>source</tt> path must exist. The 
following scenarios are addressed for a <tt>move</tt>:</p>
+<ul>
+
+<li>The file system URI(e.g. <tt>hdfs://{nameNode}</tt>) can be skipped in the 
<tt>target</tt> path. It is understood to be the same as that of the source. 
But if the target path does contain the system URI, it cannot be different than 
that of the source.</li>
+<li>The parent directory of the <tt>target</tt> path must exist</li>
+<li>For the <tt>target</tt> path, if it is a file, then it must not already 
exist.</li>
+<li>However, if the <tt>target</tt> path is an already existing directory, the 
<tt>move</tt> action will place your <tt>source</tt> as a child of the 
<tt>target</tt> directory.</li>
+</ul>
+<p>The <tt>chmod</tt> command changes the permissions for the specified path. 
Permissions can be specified using the Unix Symbolic representation (e.g. 
-rwxrw-rw-) or an octal representation (755). When doing a <tt>chmod</tt> 
command on a directory, by default the command is applied to the directory and 
the files one level within the directory. To apply the <tt>chmod</tt> command 
to the directory, without affecting the files within it, the <tt>dir-files</tt> 
attribute must be set to <tt>false</tt>. To apply the <tt>chmod</tt> command 
recursively to all levels within a directory, put a <tt>recursive</tt> element 
inside the &lt;chmod&gt; element.</p>
+<p>The <tt>touchz</tt> command creates a zero length file in the specified 
path if none exists. If one already exists, then touchz will perform a touch 
operation. Touchz works only for absolute paths.</p>
+<p>The <tt>chgrp</tt> command changes the group for the specified path. When 
doing a <tt>chgrp</tt> command on a directory, by default the command is 
applied to the directory and the files one level within the directory. To apply 
the <tt>chgrp</tt> command to the directory, without affecting the files within 
it, the <tt>dir-files</tt> attribute must be set to <tt>false</tt>. To apply 
the <tt>chgrp</tt> command recursively to all levels within a directory, put a 
<tt>recursive</tt> element inside the &lt;chgrp&gt; element.</p>
+<p>The <tt>setrep</tt> command changes replication factor of an hdfs file(s). 
Changing RF of directories or symlinks is not supported; this action requires 
an argument for RF.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+         &lt;fs&gt;
+            &lt;delete path='hdfs://foo:8020/usr/tucu/temp-data'/&gt;
+            &lt;mkdir path='archives/${wf:id()}'/&gt;
+            &lt;move source='${jobInput}' 
target='archives/${wf:id()}/processed-input'/&gt;
+            &lt;chmod path='${jobOutput}' permissions='-rwxrw-rw-' 
dir-files='true'&gt;&lt;recursive/&gt;&lt;/chmod&gt;
+            &lt;chgrp path='${jobOutput}' group='testgroup' 
dir-files='true'&gt;&lt;recursive/&gt;&lt;/chgrp&gt;
+            &lt;setrep path='archives/${wf:id()/filename(s)}' 
replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, a directory named after the workflow job ID is 
created and the input of the job, passed as workflow configuration parameter, 
is archived under the previously created directory.</p>
+<p>As of schema 0.4, if a <tt>name-node</tt> element is specified, then it is 
not necessary for any of the paths to start with the file system URI as it is 
taken from the <tt>name-node</tt> element. This is also true if the name-node 
is specified in the global section (see <a 
href="WorkflowFunctionalSpec.html#GlobalConfigurations">Global 
Configurations</a>)</p>
+<p>As of schema 0.4, zero or more <tt>job-xml</tt> elements can be specified; 
these must refer to Hadoop JobConf <tt>job.xml</tt> formatted files bundled in 
the workflow application. They can be used to set additional properties for the 
FileSystem instance.</p>
+<p>As of schema 0.4, if a <tt>configuration</tt> element is specified, then it 
will also be used to set additional JobConf properties for the FileSystem 
instance. Properties specified in the <tt>configuration</tt> element override 
properties specified in the files specified by any <tt>job-xml</tt> 
elements.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.4&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+        &lt;fs&gt;
+           &lt;name-node&gt;hdfs://foo:8020&lt;/name-node&gt;
+           &lt;job-xml&gt;fs-info.xml&lt;/job-xml&gt;
+           &lt;configuration&gt;
+             &lt;property&gt;
+               &lt;name&gt;some.property&lt;/name&gt;
+               &lt;value&gt;some.value&lt;/value&gt;
+             &lt;/property&gt;
+           &lt;/configuration&gt;
+           &lt;delete path='/usr/tucu/temp-data'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="SubWorkflowAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.5_Sub-workflow_Action"></a>3.2.5 Sub-workflow Action</h4>
+<p>The <tt>sub-workflow</tt> action runs a child workflow job.</p>
+<p>The parent workflow job will wait until the child workflow job has 
completed.</p>
+<p>There can be several sub-workflows defined within a single workflow, each 
under its own action element.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;[WF-APPLICATION-PATH]&lt;/app-path&gt;
+            &lt;propagate-configuration/&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The child workflow job runs in the same Oozie system instance where the 
parent workflow job is running.</p>
+<p>The <tt>app-path</tt> element specifies the path to the workflow 
application of the child workflow job.</p>
+<p>The <tt>propagate-configuration</tt> flag, if present, indicates that the 
workflow job configuration should be propagated to the child workflow.</p>
+<p>The <tt>configuration</tt> section can be used to specify the job 
properties that are required to run the child workflow job.</p>
+<p>The configuration of the <tt>sub-workflow</tt> action can be parameterized 
(templatized) using EL expressions.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;a&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;child-wf&lt;/app-path&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;input.dir&lt;/name&gt;
+                    &lt;value&gt;${wf:id()}/second-mr-output&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, the workflow definition with the name 
<tt>child-wf</tt> will be run on the Oozie instance at 
<tt>.http://myhost:11000/oozie</tt>. The specified workflow application must be 
already deployed on the target Oozie instance.</p>
+<p>A configuration parameter <tt>input.dir</tt> is being passed as job 
property to the child workflow job.</p>
+<p>The subworkflow can inherit the lib jars from the parent workflow by 
setting <tt>oozie.subworkflow.classpath.inheritance</tt> to true in 
oozie-site.xml or on a per-job basis by setting 
<tt>oozie.wf.subworkflow.classpath.inheritance</tt> to true in a job.properties 
file. If both are specified, 
<tt>oozie.wf.subworkflow.classpath.inheritance</tt> has priority.  If the 
subworkflow and the parent have conflicting jars, the subworkflow&#x2019;s jar 
has priority.  By default, <tt>oozie.wf.subworkflow.classpath.inheritance</tt> 
is set to false.</p>
+<p>To prevent errant workflows from starting infinitely recursive 
subworkflows, <tt>oozie.action.subworkflow.max.depth</tt> can be specified in 
oozie-site.xml to set the maximum depth of subworkflow calls.  For example, if 
set to 3, then a workflow can start subwf1, which can start subwf2, which can 
start subwf3; but if subwf3 tries to start subwf4, then the action will fail.  
The default is 50.</p>
+<p><a name="JavaAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.6_Java_Action"></a>3.2.6 Java Action</h4>
+<p>The <tt>java</tt> action will execute the <tt>public static void 
main(String[] args)</tt> method of the specified main Java class.</p>
+<p>Java applications are executed in the Hadoop cluster as map-reduce job with 
a single Mapper task.</p>
+<p>The workflow job will wait until the java application completes its 
execution before continuing to the next action.</p>
+<p>The <tt>java</tt> action has to be configured with the resource-manager, 
name-node, main Java class, JVM options and arguments.</p>
+<p>To indicate an <tt>ok</tt> action transition, the main Java class must 
complete gracefully the <tt>main</tt> method invocation.</p>
+<p>To indicate an <tt>error</tt> action transition, the main Java class must 
throw an exception.</p>
+<p>The main Java class can call <tt>System.exit(int n)</tt>. Exit code zero is 
regarded as OK, while non-zero exit codes will cause the <tt>java</tt> action 
to do an <tt>error</tt> transition and exit.</p>
+<p>A <tt>java</tt> action can be configured to perform HDFS files/directories 
cleanup or HCatalog partitions cleanup before starting the Java application. 
This capability enables Oozie to retry a Java application in the situation of a 
transient or non-transient failure (This can be used to cleanup any temporary 
data which may have been created by the Java application in case of 
failure).</p>
+<p>A <tt>java</tt> action can create a Hadoop configuration for interacting 
with a cluster (e.g. launching a map-reduce job). Oozie prepares a Hadoop 
configuration file which includes the environments site configuration files 
(e.g. hdfs-site.xml, mapred-site.xml, etc) plus the properties added to the 
<tt>&lt;configuration&gt;</tt> section of the <tt>java</tt> action. The Hadoop 
configuration file is made available as a local file to the Java application in 
its running directory. It can be added to the <tt>java</tt> actions Hadoop 
configuration by referencing the system property: 
<tt>oozie.action.conf.xml</tt>. For example:</p>
+
+<div>
+<div>
+<pre class="source">// loading action conf prepared by Oozie
+Configuration actionConf = new Configuration(false);
+actionConf.addResource(new Path(&quot;file:///&quot;, 
System.getProperty(&quot;oozie.action.conf.xml&quot;)));
+</pre></div></div>
+
+<p>If <tt>oozie.action.conf.xml</tt> is not added then the job will pick up 
the mapred-default properties and this may result in unexpected behaviour. For 
repeated configuration properties later values override earlier ones.</p>
+<p>Inline property values can be parameterized (templatized) using EL 
expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt> (<tt>resource-manager</tt>) 
and HDFS <tt>fs.default.name</tt> (<tt>name-node</tt>) properties must not be 
present in the <tt>job-xml</tt> and in the inline configuration.</p>
+<p>As with <tt>map-reduce</tt> and <tt>pig</tt> actions, it  is possible to 
add files and archives to be available to the Java application. Refer to 
section [#FilesArchives][Adding Files and Archives for the Job].</p>
+<p>The <tt>capture-output</tt> element can be used to propagate values back 
into Oozie context, which can then be accessed via EL-functions. This needs to 
be written out as a java properties format file. The filename is obtained via a 
System property specified by the constant 
<tt>oozie.action.output.properties</tt></p>
+<p><b>IMPORTANT:</b> In order for a Java action to succeed on a secure 
cluster, it must propagate the Hadoop delegation token like in the following 
code snippet (this is benign on non-secure clusters):</p>
+
+<div>
+<div>
+<pre class="source">// propagate delegation related props from launcher job to 
MR job
+if (System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;) != null) {
+    jobConf.set(&quot;mapreduce.job.credentials.binary&quot;, 
System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;));
+}
+</pre></div></div>
+
+<p><b>IMPORTANT:</b> Because the Java application is run from within a 
Map-Reduce job, from Hadoop 0.20. onwards a queue must be assigned to it. The 
queue name must be specified as a configuration property.</p>
+<p><b>IMPORTANT:</b> The Java application from a Java action is executed in a 
single map task.  If the task is abnormally terminated, such as due to a 
TaskTracker restart (e.g. during cluster maintenance), the task will be retried 
via the normal Hadoop task retry mechanism.  To avoid workflow failure, the 
application should be written in a fashion that is resilient to such retries, 
for example by detecting and deleting incomplete outputs or picking back up 
from complete outputs.  Furthermore, if a Java action spawns asynchronous 
activity outside the JVM of the action itself (such as by launching additional 
MapReduce jobs), the application must consider the possibility of collisions 
with activity spawned by the new instance.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;java&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;main-class&gt;[MAIN-CLASS]&lt;/main-class&gt;
+                       &lt;java-opts&gt;[JAVA-STARTUP-OPTS]&lt;/java-opts&gt;
+                       &lt;arg&gt;ARGUMENT&lt;/arg&gt;
+            ...
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+            &lt;capture-output /&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to 
delete before starting the Java application. This should be used exclusively 
for directory cleanup or dropping of hcatalog table or table partitions for the 
Java application to be executed. In case of <tt>delete</tt>, a glob pattern can 
be used to specify path. The format for specifying hcatalog table URI is 
hcat://[metastore server]:[port]/[database name]/[table name] and format to 
specify a hcatalog table partition URI is <tt>hcat://[metastore 
server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, 
the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog 
and hive jars need to be placed in workflow lib directory or specified using 
<tt>archive</tt> tag.</p>


[... 3526 lines stripped ...]
Added: websites/staging/oozie/trunk/content/docs/5.2.0/configuration.xsl
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/oozie/trunk/content/docs/5.2.0/configuration.xsl
------------------------------------------------------------------------------
    svn:mime-type = application/xml

svn commit: r1053698 [14/17] - in /websites/staging/oozie/trunk/content: ./ docs/5.2.0/ docs/5.2.0/css/ docs/5.2.0/fonts/ docs/5.2.0/images/ docs/5.2.0/images/logos/ docs/5.2.0/images/profiles/ docs/5.2.0/img/ docs/5.2.0/js/

Reply via email to