Added: oozie/site/trunk/content/resources/docs/5.0.0/WorkflowFunctionalSpec.html
URL: 
http://svn.apache.org/viewvc/oozie/site/trunk/content/resources/docs/5.0.0/WorkflowFunctionalSpec.html?rev=1828722&view=auto
==============================================================================
--- oozie/site/trunk/content/resources/docs/5.0.0/WorkflowFunctionalSpec.html 
(added)
+++ oozie/site/trunk/content/resources/docs/5.0.0/WorkflowFunctionalSpec.html 
Mon Apr  9 14:12:36 2018
@@ -0,0 +1,5648 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at Apr 9, 2018 
+ | Rendered using Apache Maven Fluido Skin 1.4
+-->
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Oozie - </title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" 
src="./js/apache-maven-fluido-1.4.min.js"></script>
+
+    
+                  </head>
+        <body class="topBarDisabled">
+          
+        
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="https://oozie.apache.org/"; 
id="bannerLeft">
+                                                                               
         <img src="https://oozie.apache.org/images/oozie_200x.png";  
alt="Oozie"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="../../" title="Apache">
+        Apache</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../../" title="Oozie">
+        Oozie</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../" title="docs">
+        docs</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="./" title="5.0.0">
+        5.0.0</a>
+                    <span class="divider">/</span>
+      </li>
+        <li class="active ">Oozie - </li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right"><span 
class="divider">|</span> Last Published: 2018-04-09</li>
+              <li id="projectVersion" class="pull-right">
+                    Version: 5.0.0
+        </li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span2">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+  </ul>
+                
+                    
+                
+          <hr />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/"; title="Built 
by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" 
src="./images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span10" >
+                                  
+            <p></p>
+<p><a href="./index.html">::Go back to Oozie Documentation Index::</a>
+</p>
+<hr />
+<a name="Oozie_Specification_a_Hadoop_Workflow_System"></a>
+<div class="section"><h2> Oozie Specification, a Hadoop Workflow System</h2>
+<p><b><center>(v5.0)</center></b>
+</p>
+<p>The goal of this document is to define a workflow engine system specialized 
in coordinating the execution of Hadoop
+Map/Reduce and Pig jobs.</p>
+<p><ul><ul><li><a href="#Changelog">Changelog</a>
+<ul></ul>
+</li>
+<li><a href="#a0_Definitions">0 Definitions</a>
+</li>
+<li><a href="#a1_Specification_Highlights">1 Specification Highlights</a>
+</li>
+<li><a href="#a2_Workflow_Definition">2 Workflow Definition</a>
+<ul><li><a href="#a2.1_Cycles_in_Workflow_Definitions">2.1 Cycles in Workflow 
Definitions</a>
+</li>
+</ul>
+</li>
+<li><a href="#a3_Workflow_Nodes">3 Workflow Nodes</a>
+<ul><li><a href="#a3.1_Control_Flow_Nodes">3.1 Control Flow Nodes</a>
+<ul><li><a href="#a3.1.1_Start_Control_Node">3.1.1 Start Control Node</a>
+</li>
+<li><a href="#a3.1.2_End_Control_Node">3.1.2 End Control Node</a>
+</li>
+<li><a href="#a3.1.3_Kill_Control_Node">3.1.3 Kill Control Node</a>
+</li>
+<li><a href="#a3.1.4_Decision_Control_Node">3.1.4 Decision Control Node</a>
+</li>
+<li><a href="#a3.1.5_Fork_and_Join_Control_Nodes">3.1.5 Fork and Join Control 
Nodes</a>
+</li>
+</ul>
+</li>
+<li><a href="#a3.2_Workflow_Action_Nodes">3.2 Workflow Action Nodes</a>
+<ul><li><a href="#a3.2.1_Action_Basis">3.2.1 Action Basis</a>
+<ul><li><a 
href="#a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote">3.2.1.1 Action 
Computation/Processing Is Always Remote</a>
+</li>
+<li><a href="#a3.2.1.2_Actions_Are_Asynchronous">3.2.1.2 Actions Are 
Asynchronous</a>
+</li>
+<li><a href="#a3.2.1.3_Actions_Have_2_Transitions_ok_and_error">3.2.1.3 
Actions Have 2 Transitions, =ok= and =error=</a>
+</li>
+<li><a href="#a3.2.1.4_Action_Recovery">3.2.1.4 Action Recovery</a>
+</li>
+</ul>
+</li>
+<li><a href="#a3.2.2_Map-Reduce_Action">3.2.2 Map-Reduce Action</a>
+<ul><li><a href="#a3.2.2.1_Adding_Files_and_Archives_for_the_Job">3.2.2.1 
Adding Files and Archives for the Job</a>
+</li>
+<li><a 
href="#a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code">3.2.2.2 
Configuring the MapReduce action with Java code</a>
+</li>
+<li><a href="#a3.2.2.3_Streaming">3.2.2.3 Streaming</a>
+</li>
+<li><a href="#a3.2.2.4_Pipes">3.2.2.4 Pipes</a>
+</li>
+<li><a href="#a3.2.2.5_Syntax">3.2.2.5 Syntax</a>
+</li>
+</ul>
+</li>
+<li><a href="#a3.2.3_Pig_Action">3.2.3 Pig Action</a>
+</li>
+<li><a href="#a3.2.4_Fs_HDFS_action">3.2.4 Fs (HDFS) action</a>
+</li>
+<li><a href="#a3.2.5_Sub-workflow_Action">3.2.5 Sub-workflow Action</a>
+</li>
+<li><a href="#a3.2.6_Java_Action">3.2.6 Java Action</a>
+<ul><li><a href="#a3.2.6.1_Overriding_an_actions_Main_class">3.2.6.1 
Overriding an action's Main class</a>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#a4_Parameterization_of_Workflows">4 Parameterization of 
Workflows</a>
+<ul><li><a href="#a4.1_Workflow_Job_Properties_or_Parameters">4.1 Workflow Job 
Properties (or Parameters)</a>
+</li>
+<li><a href="#a4.2_Expression_Language_Functions">4.2 Expression Language 
Functions</a>
+<ul><li><a href="#a4.2.1_Basic_EL_Constants">4.2.1 Basic EL Constants</a>
+</li>
+<li><a href="#a4.2.2_Basic_EL_Functions">4.2.2 Basic EL Functions</a>
+</li>
+<li><a href="#a4.2.3_Workflow_EL_Functions">4.2.3 Workflow EL Functions</a>
+</li>
+<li><a href="#a4.2.4_Hadoop_EL_Constants">4.2.4 Hadoop EL Constants</a>
+</li>
+<li><a href="#a4.2.5_Hadoop_EL_Functions">4.2.5 Hadoop EL Functions</a>
+</li>
+<li><a href="#a4.2.6_Hadoop_Jobs_EL_Function">4.2.6 Hadoop Jobs EL Function</a>
+</li>
+<li><a href="#a4.2.7_HDFS_EL_Functions">4.2.7 HDFS EL Functions</a>
+</li>
+<li><a href="#a4.2.8_HCatalog_EL_Functions">4.2.8 HCatalog EL Functions</a>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#a5_Workflow_Notifications">5 Workflow Notifications</a>
+<ul><li><a href="#a5.1_Workflow_Job_Status_Notification">5.1 Workflow Job 
Status Notification</a>
+</li>
+<li><a href="#a5.2_Node_Start_and_End_Notifications">5.2 Node Start and End 
Notifications</a>
+</li>
+</ul>
+</li>
+<li><a href="#a6_User_Propagation">6 User Propagation</a>
+</li>
+<li><a href="#a7_Workflow_Application_Deployment">7 Workflow Application 
Deployment</a>
+</li>
+<li><a href="#a8_External_Data_Assumptions">8 External Data Assumptions</a>
+</li>
+<li><a href="#a9_Workflow_Jobs_Lifecycle">9 Workflow Jobs Lifecycle</a>
+<ul><li><a href="#a9.1_Workflow_Job_Lifecycle">9.1 Workflow Job Lifecycle</a>
+</li>
+<li><a href="#a9.2_Workflow_Action_Lifecycle">9.2 Workflow Action Lifecycle</a>
+</li>
+</ul>
+</li>
+<li><a href="#a10_Workflow_Jobs_Recovery_re-run">10 Workflow Jobs Recovery 
(re-run)</a>
+</li>
+<li><a href="#a11_Oozie_Web_Services_API">11 Oozie Web Services API</a>
+</li>
+<li><a href="#a12_Client_API">12 Client API</a>
+</li>
+<li><a href="#a13_Command_Line_Tools">13 Command Line Tools</a>
+</li>
+<li><a href="#a14_Web_UI_Console">14 Web UI Console</a>
+</li>
+<li><a href="#a15_Customizing_Oozie_with_Extensions">15 Customizing Oozie with 
Extensions</a>
+</li>
+<li><a href="#a16_Workflow_Jobs_Priority">16 Workflow Jobs Priority</a>
+</li>
+<li><a 
href="#a17_HDFS_Share_Libraries_for_Workflow_Applications_since_Oozie_2.3">17 
HDFS Share Libraries for Workflow Applications (since Oozie 2.3)</a>
+<ul><li><a href="#a17.1_Action_Share_Library_Override_since_Oozie_3.3">17.1 
Action Share Library Override (since Oozie 3.3)</a>
+</li>
+</ul>
+</li>
+<li><a href="#a18_User-Retry_for_Workflow_Actions_since_Oozie_3.1">18 
User-Retry for Workflow Actions (since Oozie 3.1)</a>
+</li>
+<li><a href="#a19_Global_Configurations">19 Global Configurations</a>
+</li>
+<li><a href="#a20_Suspend_On_Nodes">20 Suspend On Nodes</a>
+</li>
+<li><a href="#Appendixes">Appendixes</a>
+<ul><li><a href="#Appendix_A_Oozie_Workflow_and_Common_XML_Schemas">Appendix 
A, Oozie Workflow and Common XML Schemas</a>
+<ul><li><a href="#Oozie_Workflow_Schema_Version_1.0">Oozie Workflow Schema 
Version 1.0</a>
+</li>
+<li><a href="#Oozie_Common_Schema_Version_1.0">Oozie Common Schema Version 
1.0</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.5">Oozie Workflow Schema Version 
0.5</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4.5">Oozie Workflow Schema 
Version 0.4.5</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4">Oozie Workflow Schema Version 
0.4</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.3">Oozie Workflow Schema Version 
0.3</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2.5">Oozie Workflow Schema 
Version 0.2.5</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2">Oozie Workflow Schema Version 
0.2</a>
+</li>
+<li><a href="#Oozie_SLA_Version_0.2">Oozie SLA Version 0.2</a>
+</li>
+<li><a href="#Oozie_SLA_Version_0.1">Oozie SLA Version 0.1</a>
+</li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.1">Oozie Workflow Schema Version 
0.1</a>
+</li>
+</ul>
+</li>
+<li><a href="#Appendix_B_Workflow_Examples">Appendix B, Workflow Examples</a>
+<ul></ul>
+</li>
+</ul>
+</li>
+</ul>
+</ul>
+</p>
+<a name="Changelog"></a>
+<div class="section"><h3>Changelog</h3>
+<a name="a2016FEB19"></a>
+<div class="section"><h4> 2016FEB19</h4>
+<p><ul><li>#3.2.7 Updated notes on System.exit(int n) behavior</li>
+</ul>
+</p>
+<a name="a2015APR29"></a>
+</div>
+<div class="section"><h4> 2015APR29</h4>
+<p><ul><li>#3.2.1.4 Added notes about Java action retries</li>
+<li>#3.2.7 Added notes about Java action retries</li>
+</ul>
+</p>
+<a name="a2014MAY08"></a>
+</div>
+<div class="section"><h4> 2014MAY08</h4>
+<p><ul><li>#3.2.2.4 Added support for fully qualified job-xml path</li>
+</ul>
+</p>
+<a name="a2013JUL03"></a>
+</div>
+<div class="section"><h4> 2013JUL03</h4>
+<p><ul><li>#Appendix A, Added new workflow schema 0.5 and SLA schema 0.2</li>
+</ul>
+</p>
+<a name="a2012AUG30"></a>
+</div>
+<div class="section"><h4> 2012AUG30</h4>
+<p><ul><li>#4.2.2 Added two EL functions (replaceAll and appendAll)</li>
+</ul>
+</p>
+<a name="a2012JUL26"></a>
+</div>
+<div class="section"><h4> 2012JUL26</h4>
+<p><ul><li>#Appendix A, updated XML schema 0.4 to include <tt>parameters</tt>
+ element</li>
+<li>#4.1 Updated to mention about <tt>parameters</tt>
+ element as of schema 0.4</li>
+</ul>
+</p>
+<a name="a2012JUL23"></a>
+</div>
+<div class="section"><h4> 2012JUL23</h4>
+<p><ul><li>#Appendix A, updated XML schema 0.4 (Fs action)</li>
+<li>#3.2.4 Updated to mention that a <tt>name-node</tt>
+, a <tt>job-xml</tt>
+, and a <tt>configuration</tt>
+ element are allowed in the Fs action as of</li>
+</ul>
+schema 0.4</p>
+<a name="a2012JUN19"></a>
+</div>
+<div class="section"><h4> 2012JUN19</h4>
+<p><ul><li>#Appendix A, added XML schema 0.4</li>
+<li>#3.2.2.4 Updated to mention that multiple <tt>job-xml</tt>
+ elements are allowed as of schema 0.4</li>
+<li>#3.2.3 Updated to mention that multiple <tt>job-xml</tt>
+ elements are allowed as of schema 0.4</li>
+</ul>
+</p>
+<a name="a2011AUG17"></a>
+</div>
+<div class="section"><h4> 2011AUG17</h4>
+<p><ul><li>#3.2.4 fs 'chmod' xml closing element typo in Example corrected</li>
+</ul>
+</p>
+<a name="a2011AUG12"></a>
+</div>
+<div class="section"><h4> 2011AUG12</h4>
+<p><ul><li>#3.2.4 fs 'move' action characteristics updated, to allow for 
consistent source and target paths and existing target path only if 
directory</li>
+<li>#18, Update the doc for user-retry of workflow action.</li>
+</ul>
+</p>
+<a name="a2011FEB19"></a>
+</div>
+<div class="section"><h4> 2011FEB19</h4>
+<p><ul><li>#10, Update the doc to rerun from the failed node.</li>
+</ul>
+</p>
+<a name="a2010OCT31"></a>
+</div>
+<div class="section"><h4> 2010OCT31</h4>
+<p><ul><li>#17, Added new section on Shared Libraries</li>
+</ul>
+</p>
+<a name="a2010APR27"></a>
+</div>
+<div class="section"><h4> 2010APR27</h4>
+<p><ul><li>#3.2.3 Added new &quot;arguments&quot; tag to PIG actions</li>
+<li>#3.2.5 SSH actions are deprecated in Oozie schema 0.1 and removed in Oozie 
schema 0.2</li>
+<li>#Appendix A, Added schema version 0.2</li>
+</ul>
+</p>
+<a name="a2009OCT20"></a>
+</div>
+<div class="section"><h4> 2009OCT20</h4>
+<p><ul><li>#Appendix A, updated XML schema</li>
+</ul>
+</p>
+<a name="a2009SEP15"></a>
+</div>
+<div class="section"><h4> 2009SEP15</h4>
+<p><ul><li>#3.2.6 Removing support for sub-workflow in a different Oozie 
instance (removing the 'oozie' element)</li>
+</ul>
+</p>
+<a name="a2009SEP07"></a>
+</div>
+<div class="section"><h4> 2009SEP07</h4>
+<p><ul><li>#3.2.2.3 Added Map Reduce Pipes specifications.</li>
+<li>#3.2.2.4 Map-Reduce Examples. Previously was 3.2.2.3.</li>
+</ul>
+</p>
+<a name="a2009SEP02"></a>
+</div>
+<div class="section"><h4> 2009SEP02</h4>
+<p><ul><li>#10 Added missing skip nodes property name.</li>
+<li>#3.2.1.4 Reworded action recovery explanation.</li>
+</ul>
+</p>
+<a name="a2009AUG26"></a>
+</div>
+<div class="section"><h4> 2009AUG26</h4>
+<p><ul><li>#3.2.9 Added <tt>java</tt>
+ action type</li>
+<li>#3.1.4 Example uses EL constant to refer to counter group/name</li>
+</ul>
+</p>
+<a name="a2009JUN09"></a>
+</div>
+<div class="section"><h4> 2009JUN09</h4>
+<p><ul><li>#12.2.4 Added build version resource to admin end-point</li>
+<li>#3.2.6 Added flag to propagate workflow configuration to sub-workflows</li>
+<li>#10 Added behavior for workflow job parameters given in the rerun</li>
+<li>#11.3.4 workflows info returns pagination information</li>
+</ul>
+</p>
+<a name="a2009MAY18"></a>
+</div>
+<div class="section"><h4> 2009MAY18</h4>
+<p><ul><li>#3.1.4 decision node, 'default' element, 'name' attribute changed 
to 'to'</li>
+<li>#3.1.5 fork node, 'transition' element changed to 'start', 'to' attribute 
change to 'path'</li>
+<li>#3.1.5 join node, 'transition' element remove, added 'to' attribute to 
'join' element</li>
+<li>#3.2.1.4 Rewording on action recovery section</li>
+<li>#3.2.2 map-reduce action, added 'job-tracker', 'name-node' actions, 
'file', 'file' and 'archive' elements</li>
+<li>#3.2.2.1 map-reduce action, remove from 'streaming' element 'file', 'file' 
and 'archive' elements</li>
+<li>#3.2.2.2 map-reduce action, reorganized streaming section</li>
+<li>#3.2.3 pig action, removed information about implementation (SSH), changed 
elements names</li>
+<li>#3.2.4 fs action, removed 'fs-uri' and 'user-name' elements, file system 
URI is now specified in path, user is propagated</li>
+<li>#3.2.6 sub-workflow action, renamed elements 'oozie-url' to 'oozie' and 
'workflow-app' to 'app-path'</li>
+<li>#4 Properties that are valid Java identifiers can be used as ${NAME}</li>
+<li>#4.1 Renamed default properties file from 'configuration.xml' to 
'default-configuration.xml'</li>
+<li>#4.2 Changes in EL Constants and Functions</li>
+<li>#5 Updated notification behavior and tokens</li>
+<li>#6 Changed user propagation behavior</li>
+<li>#7 Changed application packaging from ZIP to HDFS directory</li>
+<li>Removed application lifecycle and self containment model sections</li>
+<li>#10 Changed workflow job recovery, simplified recovery behavior</li>
+<li>#11 Detailed Web Services API</li>
+<li>#12 Updated  Client API section</li>
+<li>#15 Updated  Action Executor API section</li>
+<li>#Appendix A XML namespace updated to 'uri:oozie:workflow:0.1'</li>
+<li>#Appendix A Updated XML schema to changes in map-reduce/pig/fs/ssh 
actions</li>
+<li>#Appendix B Updated workflow example to schema changes</li>
+</ul>
+</p>
+<a name="a2009MAR25"></a>
+</div>
+<div class="section"><h4> 2009MAR25</h4>
+<p><ul><li>Changing all references of HWS to Oozie (project name)</li>
+<li>Typos, XML Formatting</li>
+<li>XML Schema URI correction</li>
+</ul>
+</p>
+<a name="a2009MAR09"></a>
+</div>
+<div class="section"><h4> 2009MAR09</h4>
+<p><ul><li>Changed <tt>CREATED</tt>
+ job state to <tt>PREP</tt>
+ to have same states as Hadoop</li>
+<li>Renamed 'hadoop-workflow' element to 'workflow-app'</li>
+<li>Decision syntax changed to be 'switch/case' with no transition 
indirection</li>
+<li>Action nodes common root element 'action', with the action type as 
sub-element (using a single built-in XML schema)</li>
+<li>Action nodes have 2 explicit transitions 'ok to' and 'error to' enforced 
by XML schema</li>
+<li>Renamed 'fail' action element to 'kill'</li>
+<li>Renamed 'hadoop' action element to 'map-reduce'</li>
+<li>Renamed 'hdfs' action element to 'fs'</li>
+<li>Updated all XML snippets and examples</li>
+<li>Made user propagation simpler and consistent</li>
+<li>Added Oozie XML schema to Appendix A</li>
+<li>Added workflow example to Appendix B</li>
+</ul>
+</p>
+<a name="a2009FEB22"></a>
+</div>
+<div class="section"><h4> 2009FEB22</h4>
+<p><ul><li>Opened <a class="externalLink" 
href="https://issues.apache.org/jira/browse/HADOOP-5303";>JIRA HADOOP-5303</a>
+</li>
+</ul>
+</p>
+<a name="a27DEC2012:"></a>
+</div>
+<div class="section"><h4> 27/DEC/2012:</h4>
+<p><ul><li>Added information on dropping hcatalog table partitions in prepare 
block</li>
+<li>Added hcatalog EL functions section</li>
+</ul>
+</p>
+<a name="a0_Definitions"></a>
+</div>
+</div>
+<div class="section"><h3>0 Definitions</h3>
+<p><b>Action:</b>
+ An execution/computation task (Map-Reduce job, Pig job, a shell command). It 
can also be referred as task or
+'action node'.</p>
+<p><b>Workflow:</b>
+ A collection of actions arranged in a control dependency DAG (Direct Acyclic 
Graph). &quot;control dependency&quot;
+from one action to another means that the second action can't run until the 
first action has completed.</p>
+<p><b>Workflow Definition:</b>
+ A programmatic description of a workflow that can be executed.</p>
+<p><b>Workflow Definition Language:</b>
+ The language used to define a Workflow Definition.</p>
+<p><b>Workflow Job:</b>
+ An executable instance of a workflow definition.</p>
+<p><b>Workflow Engine:</b>
+ A system that executes workflows jobs. It can also be referred as a DAG 
engine.</p>
+<a name="a1_Specification_Highlights"></a>
+</div>
+<div class="section"><h3>1 Specification Highlights</h3>
+<p>A Workflow application is DAG that coordinates the following types of 
actions: Hadoop, Pig, and
+sub-workflows.</p>
+<p>Flow control operations within the workflow applications can be done using 
decision, fork and join nodes. Cycles in
+workflows are not supported.</p>
+<p>Actions and decisions can be parameterized with job properties, actions 
output (i.e. Hadoop counters) and file information (file exists, file size, 
etc). Formal parameters are expressed in the workflow
+definition as <tt>${VAR}</tt>
+ variables.</p>
+<p>A Workflow application is a ZIP file that contains the workflow definition 
(an XML file), all the necessary files to
+run all the actions: JAR files for Map/Reduce jobs, shells for streaming 
Map/Reduce jobs, native libraries, Pig
+scripts, and other resource files.</p>
+<p>Before running a workflow job, the corresponding workflow application must 
be deployed in Oozie.</p>
+<p>Deploying workflow application and running workflow jobs can be done via 
command line tools, a WS API and a Java API.</p>
+<p>Monitoring the system and workflow jobs can be done via a web console, 
command line tools, a WS API and a Java API.</p>
+<p>When submitting a workflow job, a set of properties resolving all the 
formal parameters in the workflow definitions
+must be provided. This set of properties is a Hadoop configuration.</p>
+<p>Possible states for a workflow jobs are: <tt>PREP</tt>
+, <tt>RUNNING</tt>
+, <tt>SUSPENDED</tt>
+, <tt>SUCCEEDED</tt>
+, <tt>KILLED</tt>
+ and <tt>FAILED</tt>
+.</p>
+<p>In the case of a action start failure in a workflow job, depending on the 
type of failure, Oozie will attempt automatic
+retries, it will request a manual retry or it will fail the workflow job.</p>
+<p>Oozie can make HTTP callback notifications on action start/end/failure 
events and workflow end/failure events.</p>
+<p>In the case of workflow job failure, the workflow job can be resubmitted 
skipping previously completed actions.
+Before doing a resubmission the workflow application could be updated with a 
patch to fix a problem in the workflow
+application code.</p>
+<p><a name="WorkflowDefinition"></a>
+</p>
+<a name="a2_Workflow_Definition"></a>
+</div>
+<div class="section"><h3>2 Workflow Definition</h3>
+<p>A workflow definition is a DAG with control flow nodes (start, end, 
decision, fork, join, kill) or action nodes
+(map-reduce, pig, etc.), nodes are connected by transitions arrows.</p>
+<p>The workflow definition language is XML based and it is called hPDL (Hadoop 
Process Definition Language).</p>
+<p>Refer to the Appendix A for the<a 
href="./WorkflowFunctionalSpec.html#OozieWFSchema">Oozie Workflow Definition 
XML Schema</a>
+. Appendix
+B has <a href="./WorkflowFunctionalSpec.html#OozieWFExamples">Workflow 
Definition Examples</a>
+.</p>
+<a name="a2.1_Cycles_in_Workflow_Definitions"></a>
+<div class="section"><h4>2.1 Cycles in Workflow Definitions</h4>
+<p>Oozie does not support cycles in workflow definitions, workflow definitions 
must be a strict DAG.</p>
+<p>At workflow application deployment time, if Oozie detects a cycle in the 
workflow definition it must fail the
+deployment.</p>
+<a name="a3_Workflow_Nodes"></a>
+</div>
+</div>
+<div class="section"><h3>3 Workflow Nodes</h3>
+<p>Workflow nodes are classified in control flow nodes and action nodes:</p>
+<p><ul><li><b>Control flow nodes:</b>
+ nodes that control the start and end of the workflow and workflow job 
execution path.</li>
+<li><b>Action nodes:</b>
+ nodes that trigger the execution of a computation/processing task.</li>
+</ul>
+</p>
+<p>Node names and transitions must be conform to the following pattern 
=[a-zA-Z][\-_a-zA-Z0-0]*=, of up to 20 characters
+long.</p>
+<a name="a3.1_Control_Flow_Nodes"></a>
+<div class="section"><h4>3.1 Control Flow Nodes</h4>
+<p>Control flow nodes define the beginning and the end of a workflow (the 
<tt>start</tt>
+, <tt>end</tt>
+ and <tt>kill</tt>
+ nodes) and provide a
+mechanism to control the workflow execution path (the <tt>decision</tt>
+, <tt>fork</tt>
+ and <tt>join</tt>
+ nodes).</p>
+<p><a name="StartNode"></a>
+</p>
+<a name="a3.1.1_Start_Control_Node"></a>
+<div class="section"><h5>3.1.1 Start Control Node</h5>
+<p>The <tt>start</tt>
+ node is the entry point for a workflow job, it indicates the first workflow 
node the workflow job must
+transition to.</p>
+<p>When a workflow is started, it automatically transitions to the node 
specified in the <tt>start</tt>
+.</p>
+<p>A workflow definition must have one <tt>start</tt>
+ node.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+  ...
+  &lt;start to=&quot;[NODE-NAME]&quot;/&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>to</tt>
+ attribute is the name of first workflow node to execute.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;start to=&quot;firstHadoopJob&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="EndNode"></a>
+</p>
+<a name="a3.1.2_End_Control_Node"></a>
+</div>
+<div class="section"><h5>3.1.2 End Control Node</h5>
+<p>The <tt>end</tt>
+ node is the end for a workflow job, it indicates that the workflow job has 
completed successfully.</p>
+<p>When a workflow job reaches the <tt>end</tt>
+ it finishes successfully (SUCCEEDED).</p>
+<p>If one or more actions started by the workflow job are executing when the 
<tt>end</tt>
+ node is reached, the actions will be
+killed. In this scenario the workflow job is still considered as successfully 
run.</p>
+<p>A workflow definition must have one <tt>end</tt>
+ node.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;[NODE-NAME]&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>name</tt>
+ attribute is the name of the transition to do to end the workflow job.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;end&quot;/&gt;
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="KillNode"></a>
+</p>
+<a name="a3.1.3_Kill_Control_Node"></a>
+</div>
+<div class="section"><h5>3.1.3 Kill Control Node</h5>
+<p>The <tt>kill</tt>
+ node allows a workflow job to kill itself.</p>
+<p>When a workflow job reaches the <tt>kill</tt>
+ it finishes in error (KILLED).</p>
+<p>If one or more actions started by the workflow job are executing when the 
<tt>kill</tt>
+ node is reached, the actions will be
+killed.</p>
+<p>A workflow definition may have zero or more <tt>kill</tt>
+ nodes.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;message&gt;[MESSAGE-TO-LOG]&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>name</tt>
+ attribute in the <tt>kill</tt>
+ node is the name of the Kill action node.</p>
+<p>The content of the <tt>message</tt>
+ element will be logged as the kill reason for the workflow job.</p>
+<p>A <tt>kill</tt>
+ node does not have transition elements because it ends the workflow job, as 
<tt>KILLED</tt>
+.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;killBecauseNoInput&quot;&gt;
+        &lt;message&gt;Input unavailable&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="DecisionNode"></a>
+</p>
+<a name="a3.1.4_Decision_Control_Node"></a>
+</div>
+<div class="section"><h5>3.1.4 Decision Control Node</h5>
+<p>A <tt>decision</tt>
+ node enables a workflow to make a selection on the execution path to 
follow.</p>
+<p>The behavior of a <tt>decision</tt>
+ node can be seen as a switch-case statement.</p>
+<p>A <tt>decision</tt>
+ node consists of a list of predicates-transition pairs plus a default 
transition. Predicates are evaluated
+in order or appearance until one of them evaluates to <tt>true</tt>
+ and the corresponding transition is taken. If none of the
+predicates evaluates to <tt>true</tt>
+ the <tt>default</tt>
+ transition is taken.</p>
+<p>Predicates are JSP Expression Language (EL) expressions (refer to section 
4.2 of this document) that resolve into a
+boolean value, <tt>true</tt>
+ or <tt>false</tt>
+. For example:</p>
+<p><pre>
+    ${fs:fileSize('/usr/foo/myinputdir') gt 10 * GB}
+</pre></p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            ...
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            &lt;default to=&quot;[NODE_NAME]&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>name</tt>
+ attribute in the <tt>decision</tt>
+ node is the name of the decision node.</p>
+<p>Each <tt>case</tt>
+ elements contains a predicate and a transition name. The predicate ELs are 
evaluated
+in order until one returns <tt>true</tt>
+ and the corresponding transition is taken.</p>
+<p>The <tt>default</tt>
+ element indicates the transition to take if none of the predicates evaluates
+to <tt>true</tt>
+.</p>
+<p>All decision nodes must have a <tt>default</tt>
+ element to avoid bringing the workflow into an error
+state if none of the predicates evaluates to true.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;mydecision&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;reconsolidatejob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) gt 10 * GB}
+            &lt;/case&gt; &lt;case to=&quot;rexpandjob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) lt 100 * MB}
+            &lt;/case&gt;
+            &lt;case to=&quot;recomputejob&quot;&gt;
+              ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
+            &lt;/case&gt;
+            &lt;default to=&quot;end&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="ForkJoinNodes"></a>
+</p>
+<a name="a3.1.5_Fork_and_Join_Control_Nodes"></a>
+</div>
+<div class="section"><h5>3.1.5 Fork and Join Control Nodes</h5>
+<p>A <tt>fork</tt>
+ node splits one path of execution into multiple concurrent paths of 
execution.</p>
+<p>A <tt>join</tt>
+ node waits until every concurrent execution path of a previous <tt>fork</tt>
+ node arrives to it.</p>
+<p>The <tt>fork</tt>
+ and <tt>join</tt>
+ nodes must be used in pairs. The <tt>join</tt>
+ node assumes concurrent execution paths are children of
+the same <tt>fork</tt>
+ node.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;[FORK-NODE-NAME]&quot;&gt;
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+        ...
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+    &lt;/fork&gt;
+    ...
+    &lt;join name=&quot;[JOIN-NODE-NAME]&quot; to=&quot;[NODE-NAME]&quot; /&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>name</tt>
+ attribute in the <tt>fork</tt>
+ node is the name of the workflow fork node. The <tt>start</tt>
+ attribute in the <tt>path</tt>
+
+elements in the <tt>fork</tt>
+ node indicate the name of the workflow node that will be part of the 
concurrent execution paths.</p>
+<p>The <tt>name</tt>
+ attribute in the <tt>join</tt>
+ node is the name of the workflow join node. The <tt>to</tt>
+ attribute in the <tt>join</tt>
+ node
+indicates the name of the workflow node that will executed after all 
concurrent execution paths of the corresponding
+fork arrive to the join node.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;forking&quot;&gt;
+        &lt;path start=&quot;firstparalleljob&quot;/&gt;
+        &lt;path start=&quot;secondparalleljob&quot;/&gt;
+    &lt;/fork&gt;
+    &lt;action name=&quot;firstparallejob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job1.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;action name=&quot;secondparalleljob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job2.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;join name=&quot;joining&quot; to=&quot;nextaction&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>By default, Oozie performs some validation that any forking in a workflow 
is valid and won't lead to any incorrect behavior or
+instability.  However, if Oozie is preventing a workflow from being submitted 
and you are very certain that it should work, you can
+disable forkjoin validation so that Oozie will accept the workflow.  To 
disable this validation just for a specific workflow, simply
+set <tt>oozie.wf.validate.ForkJoin</tt>
+ to <tt>false</tt>
+ in the job.properties file.  To disable this validation for all workflows, 
simply set
+=oozie.validate.ForkJoin= to <tt>false</tt>
+ in the oozie-site.xml file.  Disabling this validation is determined by the 
AND of both of
+these properties, so it will be disabled if either or both are set to false 
and only enabled if both are set to true (or not
+specified).</p>
+<p><a name="ActionNodes"></a>
+</p>
+<a name="a3.2_Workflow_Action_Nodes"></a>
+</div>
+</div>
+<div class="section"><h4>3.2 Workflow Action Nodes</h4>
+<p>Action nodes are the mechanism by which a workflow triggers the execution 
of a computation/processing task.</p>
+<a name="a3.2.1_Action_Basis"></a>
+<div class="section"><h5>3.2.1 Action Basis</h5>
+<p>The following sub-sections define common behavior and capabilities for all 
action types.</p>
+<a name="a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote"></a>
+<div class="section"><h6>3.2.1.1 Action Computation/Processing Is Always 
Remote</h6>
+<p>All computation/processing tasks triggered by an action node are remote to 
Oozie. No workflow application specific
+computation/processing task is executed within Oozie.</p>
+<a name="a3.2.1.2_Actions_Are_Asynchronous"></a>
+</div>
+<div class="section"><h6>3.2.1.2 Actions Are Asynchronous</h6>
+<p>All computation/processing tasks triggered by an action node are executed 
asynchronously by Oozie. For most types of
+computation/processing tasks triggered by workflow action, the workflow job 
has to wait until the
+computation/processing task completes before transitioning to the following 
node in the workflow.</p>
+<p>The exception is the <tt>fs</tt>
+ action that is handled as a synchronous action.</p>
+<p>Oozie can detect completion of computation/processing tasks by two 
different means, callbacks and polling.</p>
+<p>When a computation/processing tasks is started by Oozie, Oozie provides a 
unique callback URL to the task, the task
+should invoke the given URL to notify its completion.</p>
+<p>For cases that the task failed to invoke the callback URL for any reason 
(i.e. a transient network failure) or when
+the type of task cannot invoke the callback URL upon completion, Oozie has a 
mechanism to poll computation/processing
+tasks for completion.</p>
+<a name="a3.2.1.3_Actions_Have_2_Transitions_ok_and_error"></a>
+</div>
+<div class="section"><h6>3.2.1.3 Actions Have 2 Transitions, =ok= and 
=error=</h6>
+<p>If a computation/processing task -triggered by a workflow- completes 
successfully, it transitions to <tt>ok</tt>
+.</p>
+<p>If a computation/processing task -triggered by a workflow- fails to 
complete successfully, its transitions to <tt>error</tt>
+.</p>
+<p>If a computation/processing task exits in error, there 
computation/processing task must provide <tt>error-code</tt>
+ and
+ <tt>error-message</tt>
+ information to Oozie. This information can be used from <tt>decision</tt>
+ nodes to implement a fine grain
+error handling at workflow application level.</p>
+<p>Each action type must clearly define all the error codes it can produce.</p>
+<a name="a3.2.1.4_Action_Recovery"></a>
+</div>
+<div class="section"><h6>3.2.1.4 Action Recovery</h6>
+<p>Oozie provides recovery capabilities when starting or ending actions.</p>
+<p>Once an action starts successfully Oozie will not retry starting the action 
if the action fails during its execution.
+The assumption is that the external system (i.e. Hadoop) executing the action 
has enough resilience to recover jobs
+once it has started (i.e. Hadoop task retries).</p>
+<p>Java actions are a special case with regard to retries.  Although Oozie 
itself does not retry Java actions
+should they fail after they have successfully started, Hadoop itself can cause 
the action to be restarted due to a
+map task retry on the map task running the Java application.  See the Java 
Action section below for more detail.</p>
+<p>For failures that occur prior to the start of the job, Oozie will have 
different recovery strategies depending on the
+nature of the failure.</p>
+<p>If the failure is of transient nature, Oozie will perform retries after a 
pre-defined time interval. The number of
+retries and timer interval for a type of action must be pre-configured at 
Oozie level. Workflow jobs can override such
+configuration.</p>
+<p>Examples of a transient failures are network problems or a remote system 
temporary unavailable.</p>
+<p>If the failure is of non-transient nature, Oozie will suspend the workflow 
job until an manual or programmatic
+intervention resumes the workflow job and the action start or end is retried. 
It is the responsibility of an
+administrator or an external managing system to perform any necessary cleanup 
before resuming the workflow job.</p>
+<p>If the failure is an error and a retry will not resolve the problem, Oozie 
will perform the error transition for the
+action.</p>
+<p><a name="MapReduceAction"></a>
+</p>
+<a name="a3.2.2_Map-Reduce_Action"></a>
+</div>
+</div>
+<div class="section"><h5>3.2.2 Map-Reduce Action</h5>
+<p>The <tt>map-reduce</tt>
+ action starts a Hadoop map/reduce job from a workflow. Hadoop jobs can be 
Java Map/Reduce jobs or
+streaming jobs.</p>
+<p>A <tt>map-reduce</tt>
+ action can be configured to perform file system cleanup and directory 
creation before starting the
+map reduce job. This capability enables Oozie to retry a Hadoop job in the 
situation of a transient failure (Hadoop
+checks the non-existence of the job output directory and then creates it when 
the Hadoop job is starting, thus a retry
+without cleanup of the job output directory would fail).</p>
+<p>The workflow job will wait until the Hadoop map/reduce job completes before 
continuing to the next action in the
+workflow execution path.</p>
+<p>The counters of the Hadoop job and job exit status (=FAILED=, 
<tt>KILLED</tt>
+ or <tt>SUCCEEDED</tt>
+) must be available to the
+workflow job after the Hadoop jobs ends. This information can be used from 
within decision nodes and other actions
+configurations.</p>
+<p>The <tt>map-reduce</tt>
+ action has to be configured with all the necessary Hadoop JobConf properties 
to run the Hadoop
+map/reduce job.</p>
+<p>Hadoop JobConf properties can be specified as part of<ul><li>the 
<tt>config-default.xml</tt>
+ or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li><global> tag in workflow definition or</li>
+<li>Inline <tt>map-reduce</tt>
+ action configuration or</li>
+<li>An implementation of OozieActionConfigurator specified by the 
<config-class> tag in workflow definition.</li>
+</ul>
+</p>
+<p>The configuration properties are loaded in the following above order i.e. 
<tt>streaming</tt>
+, <tt>job-xml</tt>
+, <tt>configuration</tt>
+,
+and <tt>config-class</tt>
+, and the precedence order is later values override earlier values.</p>
+<p>Streaming and inline property values can be parameterized (templatized) 
using EL expressions.</p>
+<p>The Hadoop <tt>mapred.job.tracker</tt>
+ and <tt>fs.default.name</tt>
+ properties must not be present in the job-xml and inline
+configuration.</p>
+<p><a name="FilesArchives"></a>
+</p>
+<a name="a3.2.2.1_Adding_Files_and_Archives_for_the_Job"></a>
+<div class="section"><h6>3.2.2.1 Adding Files and Archives for the Job</h6>
+<p>The <tt>file</tt>
+, <tt>archive</tt>
+ elements make available, to map-reduce jobs, files and archives. If the 
specified path is
+relative, it is assumed the file or archiver are within the application 
directory, in the corresponding sub-path.
+If the path is absolute, the file or archive it is expected in the given 
absolute path.</p>
+<p>Files specified with the <tt>file</tt>
+ element, will be symbolic links in the home directory of the task.</p>
+<p>If a file is a native library (an '.so' or a '.so.#' file), it will be 
symlinked as and '.so' file in the task running
+directory, thus available to the task JVM.</p>
+<p>To force a symlink for a file on the task running directory, use a '#' 
followed by the symlink name. For example
+'mycat.sh#cat'.</p>
+<p>Refer to Hadoop distributed cache documentation for details more details on 
files and archives.</p>
+<a name="a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code"></a>
+</div>
+<div class="section"><h6>3.2.2.2 Configuring the MapReduce action with Java 
code</h6>
+<p>Java code can be used to further configure the MapReduce action.  This can 
be useful if you already have &quot;driver&quot; code for your
+MapReduce action, if you're more familiar with MapReduce's Java API, if 
there's some configuration that requires logic, or some
+configuration that's difficult to do in straight XML (e.g. Avro).</p>
+<p>Create a class that implements the 
org.apache.oozie.action.hadoop.OozieActionConfigurator interface from the 
&quot;oozie-sharelib-oozie&quot;
+artifact.  It contains a single method that receives a <tt>JobConf</tt>
+ as an argument.  Any configuration properties set on this <tt>JobConf</tt>
+
+will be used by the MapReduce action.</p>
+<p>The OozieActionConfigurator has this signature:
+<pre>
+public interface OozieActionConfigurator {
+    public void configure(JobConf actionConf) throws 
OozieActionConfiguratorException;
+}
+</pre>
+where <tt>actionConf</tt>
+ is the <tt>JobConf</tt>
+ you can update.  If you need to throw an Exception, you can wrap it in
+an <tt>OozieActionConfiguratorException</tt>
+, also in the &quot;oozie-sharelib-oozie&quot; artifact.</p>
+<p>For example:
+<pre>
+package com.example;import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileInputFormat;
+import org.apache.hadoop.mapred.FileOutputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.oozie.action.hadoop.OozieActionConfigurator;
+import org.apache.oozie.action.hadoop.OozieActionConfiguratorException;
+import org.apache.oozie.example.SampleMapper;
+import org.apache.oozie.example.SampleReducer;
+public class MyConfigClass implements OozieActionConfigurator {
+    @Override
+    public void configure(JobConf actionConf) throws 
OozieActionConfiguratorException {
+        if (actionConf.getUser() == null) {
+            throw new OozieActionConfiguratorException(&quot;No user 
set&quot;);
+        }
+        actionConf.setMapperClass(SampleMapper.class);
+        actionConf.setReducerClass(SampleReducer.class);
+        FileInputFormat.setInputPaths(actionConf, new Path(&quot;/user/&quot; 
+ actionConf.getUser() + &quot;/input-data&quot;));
+        FileOutputFormat.setOutputPath(actionConf, new Path(&quot;/user/&quot; 
+ actionConf.getUser() + &quot;/output&quot;));
+        ...
+    }
+}
+</pre>
+</p>
+<p>To use your config class in your MapReduce action, simply compile it into a 
jar, make the jar available to your action, and specify
+the class name in the <tt>config-class</tt>
+ element (this requires at least schema 0.5):
+<pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            ...
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            ...
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>Another example of this can be found in the &quot;map-reduce&quot; example 
that comes with Oozie.</p>
+<p>A useful tip: The initial <tt>JobConf</tt>
+ passed to the <tt>configure</tt>
+ method includes all of the properties listed in the <tt>configuration</tt>
+
+section of the MR action in a workflow.  If you need to pass any information 
to your OozieActionConfigurator, you can simply put
+them here.</p>
+<p><a name="StreamingMapReduceAction"></a>
+</p>
+<a name="a3.2.2.3_Streaming"></a>
+</div>
+<div class="section"><h6>3.2.2.3 Streaming</h6>
+<p>Streaming information can be specified in the <tt>streaming</tt>
+ element.</p>
+<p>The <tt>mapper</tt>
+ and <tt>reducer</tt>
+ elements are used to specify the executable/script to be used as mapper and 
reducer.</p>
+<p>User defined scripts must be bundled with the workflow application and they 
must be declared in the <tt>files</tt>
+ element of
+the streaming configuration. If the are not declared in the <tt>files</tt>
+ element of the configuration it is assumed they
+will be available (and in the command PATH) of the Hadoop slave machines.</p>
+<p>Some streaming jobs require Files found on HDFS to be available to the 
mapper/reducer scripts. This is done using
+the <tt>file</tt>
+ and <tt>archive</tt>
+ elements described in the previous section.</p>
+<p>The Mapper/Reducer can be overridden by a <tt>mapred.mapper.class</tt>
+ or <tt>mapred.reducer.class</tt>
+ properties in the <tt>job-xml</tt>
+
+file or <tt>configuration</tt>
+ elements.</p>
+<p><a name="PipesMapReduceAction"></a>
+</p>
+<a name="a3.2.2.4_Pipes"></a>
+</div>
+<div class="section"><h6>3.2.2.4 Pipes</h6>
+<p>Pipes information can be specified in the <tt>pipes</tt>
+ element.</p>
+<p>A subset of the command line options which can be used while using the 
Hadoop Pipes Submitter can be specified
+via elements - <tt>map</tt>
+, <tt>reduce</tt>
+, <tt>inputformat</tt>
+, <tt>partitioner</tt>
+, <tt>writer</tt>
+, <tt>program</tt>
+.</p>
+<p>The <tt>program</tt>
+ element is used to specify the executable/script to be used.</p>
+<p>User defined program must be bundled with the workflow application.</p>
+<p>Some pipe jobs require Files found on HDFS to be available to the 
mapper/reducer scripts. This is done using
+the <tt>file</tt>
+ and <tt>archive</tt>
+ elements described in the previous section.</p>
+<p>Pipe properties can be overridden by specifying them in the <tt>job-xml</tt>
+ file or <tt>configuration</tt>
+ element.</p>
+<a name="a3.2.2.5_Syntax"></a>
+</div>
+<div class="section"><h6>3.2.2.5 Syntax</h6>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;[PATH]&quot;/&gt;
+                ...
+                &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+                ...
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;[MAPPER-PROCESS]&lt;/mapper&gt;
+                &lt;reducer&gt;[REDUCER-PROCESS]&lt;/reducer&gt;
+                
&lt;record-reader&gt;[RECORD-READER-CLASS]&lt;/record-reader&gt;
+                
&lt;record-reader-mapping&gt;[NAME=VALUE]&lt;/record-reader-mapping&gt;
+                ...
+                &lt;env&gt;[NAME=VALUE]&lt;/env&gt;
+                ...
+            &lt;/streaming&gt;
+                       &lt;!-- Either streaming or pipes can be specified for 
an action, not both --&gt;
+            &lt;pipes&gt;
+                &lt;map&gt;[MAPPER]&lt;/map&gt;
+                &lt;reduce&gt;[REDUCER]&lt;/reducer&gt;
+                &lt;inputformat&gt;[INPUTFORMAT]&lt;/inputformat&gt;
+                &lt;partitioner&gt;[PARTITIONER]&lt;/partitioner&gt;
+                &lt;writer&gt;[OUTPUTFORMAT]&lt;/writer&gt;
+                &lt;program&gt;[EXECUTABLE]&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/map-reduce&gt;        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre>
+</p>
+<p>The <tt>prepare</tt>
+ element, if present, indicates a list of paths to delete before starting the 
job. This should be used
+exclusively for directory cleanup or dropping of hcatalog table or table 
partitions for the job to be executed. The delete operation
+will be performed in the <tt>fs.default.name</tt>
+ filesystem for hdfs URIs. The format for specifying hcatalog table URI is
+hcat://[metastore server]:[port]/[database name]/[table name] and format to 
specify a hcatalog table partition URI is
+hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value].
+In case of a hcatalog URI, the hive-site.xml needs to be shipped using 
<tt>file</tt>
+ tag and the hcatalog and hive jars
+need to be placed in workflow lib directory or specified using <tt>archive</tt>
+ tag.</p>
+<p>The <tt>job-xml</tt>
+ element, if present, must refer to a Hadoop JobConf <tt>job.xml</tt>
+ file bundled in the workflow application.
+By default the <tt>job.xml</tt>
+ file is taken from the workflow application namenode, regardless the namenode 
specified for the action.
+To specify a <tt>job.xml</tt>
+ on another namenode use a fully qualified file path.
+The <tt>job-xml</tt>
+ element is optional and as of schema 0.4, multiple <tt>job-xml</tt>
+ elements are allowed in order to specify multiple Hadoop JobConf 
<tt>job.xml</tt>
+ files.</p>
+<p>The <tt>configuration</tt>
+ element, if present, contains JobConf properties for the Hadoop job.</p>
+<p>Properties specified in the <tt>configuration</tt>
+ element override properties specified in the file specified in the
+ <tt>job-xml</tt>
+ element.</p>
+<p>As of schema 0.5, the <tt>config-class</tt>
+ element, if present, contains a class that implements OozieActionConfigurator 
that can be used
+to further configure the MapReduce job.</p>
+<p>Properties specified in the <tt>config-class</tt>
+ class override properties specified in <tt>configuration</tt>
+ element.</p>
+<p>External Stats can be turned on/off by specifying the property 
<i>oozie.action.external.stats.write</i>
+ as <i>true</i>
+ or <i>false</i>
+ in the configuration element of workflow.xml. The default value for this 
property is <i>false</i>
+.</p>
+<p>The <tt>file</tt>
+ element, if present, must specify the target symbolic link for binaries by 
separating the original file and target with a # (file#target-sym-link). This 
is not required for libraries.</p>
+<p>The <tt>mapper</tt>
+ and <tt>reducer</tt>
+ process for streaming jobs, should specify the executable command with URL 
encoding. e.g. '%' should be replaced by '%25'.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;foo-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstHadoopJob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete 
path=&quot;hdfs://foo:8020/usr/tucu/output-data&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;/myfirstjob.xml&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.reduce.tasks&lt;/name&gt;
+                    &lt;value&gt;${firstJobReducers}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;myNextAction&quot;/&gt;
+        &lt;error to=&quot;errorCleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>In the above example, the number of Reducers to be used by the Map/Reduce 
job has to be specified as a parameter of
+the workflow job configuration when creating the workflow job.</p>
+<p><b>Streaming Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;/bin/bash testarchive/bin/mapper.sh 
testfile&lt;/mapper&gt;
+                &lt;reducer&gt;/bin/bash 
testarchive/bin/reducer.sh&lt;/reducer&gt;
+            &lt;/streaming&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;stream.num.map.output.key.fields&lt;/name&gt;
+                    &lt;value&gt;3&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;file&gt;/users/blabla/testfile.sh#testfile&lt;/file&gt;
+            
&lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><b>Pipes Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;pipes&gt;
+                
&lt;program&gt;bin/wordcount-simple#wordcount-simple&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            
&lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="PigAction"></a>
+</p>
+<a name="a3.2.3_Pig_Action"></a>
+</div>
+</div>
+<div class="section"><h5>3.2.3 Pig Action</h5>
+<p>The <tt>pig</tt>
+ action starts a Pig job.</p>
+<p>The workflow job will wait until the pig job completes before continuing to 
the next action.</p>
+<p>The <tt>pig</tt>
+ action has to be configured with the resource-manager, name-node, pig script 
and the necessary parameters and
+configuration to run the Pig job.</p>
+<p>A <tt>pig</tt>
+ action can be configured to perform HDFS files/directories cleanup or 
HCatalog partitions cleanup before
+starting the Pig job. This capability enables Oozie to retry a Pig job in the 
situation of a transient failure (Pig
+creates temporary directories for intermediate data, thus a retry without 
cleanup would fail).</p>
+<p>Hadoop JobConf properties can be specified as part of<ul><li>the 
<tt>config-default.xml</tt>
+ or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li><global> tag in workflow definition or</li>
+<li>Inline <tt>pig</tt>
+ action configuration.</li>
+</ul>
+</p>
+<p>The configuration properties are loaded in the following above order i.e. 
<tt>job-xml</tt>
+ and <tt>configuration</tt>
+, and 
+the precedence order is later values override earlier values.</p>
+<p>Inline property values can be parameterized (templatized) using EL 
expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt>
+ and HDFS <tt>fs.default.name</tt>
+ properties must not be present in the job-xml and inline
+configuration.</p>
+<p>As with Hadoop map-reduce jobs, it  is possible to add files and archives 
to be available to the Pig job, refer to
+section [#FilesArchives][Adding Files and Archives for the Job].</p>
+<p><b>Syntax for Pig actions in Oozie schema 1.0:</b>
+
+<pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><b>Syntax for Pig actions in Oozie schema 0.2:</b>
+
+<pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><b>Syntax for Pig actions in Oozie schema 0.1:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>prepare</tt>
+ element, if present, indicates a list of paths to delete before starting the 
job. This should be used
+exclusively for directory cleanup or dropping of hcatalog table or table 
partitions for the job to be executed. The delete operation
+will be performed in the <tt>fs.default.name</tt>
+ filesystem for hdfs URIs. The format for specifying hcatalog table URI is
+hcat://[metastore server]:[port]/[database name]/[table name] and format to 
specify a hcatalog table partition URI is
+hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value].
+In case of a hcatalog URI, the hive-site.xml needs to be shipped using 
<tt>file</tt>
+ tag and the hcatalog and hive jars
+need to be placed in workflow lib directory or specified using <tt>archive</tt>
+ tag.</p>
+<p>The <tt>job-xml</tt>
+ element, if present, must refer to a Hadoop JobConf <tt>job.xml</tt>
+ file bundled in the workflow application.
+The <tt>job-xml</tt>
+ element is optional and as of schema 0.4, multiple <tt>job-xml</tt>
+ elements are allowed in order to specify multiple Hadoop JobConf 
<tt>job.xml</tt>
+ files.</p>
+<p>The <tt>configuration</tt>
+ element, if present, contains JobConf properties for the underlying Hadoop 
jobs.</p>
+<p>Properties specified in the <tt>configuration</tt>
+ element override properties specified in the file specified in the
+ <tt>job-xml</tt>
+ element.</p>
+<p>External Stats can be turned on/off by specifying the property 
<i>oozie.action.external.stats.write</i>
+ as <i>true</i>
+ or <i>false</i>
+ in the configuration element of workflow.xml. The default value for this 
property is <i>false</i>
+.</p>
+<p>The inline and job-xml configuration properties are passed to the Hadoop 
jobs submitted by Pig runtime.</p>
+<p>The <tt>script</tt>
+ element contains the pig script to execute. The pig script can be templatized 
with variables of the
+form <tt>${VARIABLE}</tt>
+. The values of these variables can then be specified using the <tt>params</tt>
+ element.</p>
+<p>NOTE: Oozie will perform the parameter substitution before firing the pig 
job. This is different from the
+<a class="externalLink" 
href="http://wiki.apache.org/pig/ParameterSubstitution";>parameter substitution 
mechanism provided by Pig</a>
+, which has a
+few limitations.</p>
+<p>The <tt>params</tt>
+ element, if present, contains parameters to be passed to the pig script.</p>
+<p><b>In Oozie schema 0.2:</b>
+
+The <tt>arguments</tt>
+ element, if present, contains arguments to be passed to the pig script.</p>
+<p>All the above elements can be parameterized (templatized) using EL 
expressions.</p>
+<p><b>Example for Oozie schema 0.2:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;INPUT=${inputDir}&lt;/argument&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;OUTPUT=${outputDir}/pig-output3&lt;/argument&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><b>Example for Oozie schema 0.1:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;param&gt;InputDir=/home/tucu/input-data&lt;/param&gt;
+            &lt;param&gt;OutputDir=${jobOutput}&lt;/param&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="FsAction"></a>
+</p>
+<a name="a3.2.4_Fs_HDFS_action"></a>
+</div>
+<div class="section"><h5>3.2.4 Fs (HDFS) action</h5>
+<p>The <tt>fs</tt>
+ action allows to manipulate files and directories in HDFS from a workflow 
application. The supported commands
+are <tt>move</tt>
+, <tt>delete</tt>
+, <tt>mkdir</tt>
+, <tt>chmod</tt>
+, <tt>touchz</tt>
+, <tt>setrep</tt>
+ and <tt>chgrp</tt>
+.</p>
+<p>The FS commands are executed synchronously from within the FS action, the 
workflow job will wait until the specified
+file commands are completed before continuing to the next action.</p>
+<p>Path names specified in the <tt>fs</tt>
+ action can be parameterized (templatized) using EL expressions.
+Path name should be specified as a absolute path. In case of <tt>move</tt>
+, <tt>delete</tt>
+, <tt>chmod</tt>
+ and <tt>chgrp</tt>
+ commands, a glob pattern can also be specified instead of an absolute path.
+For <tt>move</tt>
+, glob pattern can only be specified for source path and not the target.</p>
+<p>Each file path must specify the file system URI, for move operations, the 
target must not specify the system URI.</p>
+<p><b>IMPORTANT:</b>
+ For the purposes of copying files within a cluster it is recommended to refer 
to the <tt>distcp</tt>
+ action
+instead. Refer to <a href="./DG_DistCpActionExtension.html">=distcp=</a>
+ action to copy files within a cluster.</p>
+<p><b>IMPORTANT:</b>
+ All the commands within <tt>fs</tt>
+ action do not happen atomically, if a <tt>fs</tt>
+ action fails half way in the
+commands being executed, successfully executed commands are not rolled back. 
The <tt>fs</tt>
+ action, before executing any
+command must check that source paths exist and target paths don't exist 
(constraint regarding target relaxed for the <tt>move</tt>
+ action. See below for details), thus failing before executing any command.
+Therefore the validity of all paths specified in one <tt>fs</tt>
+ action are evaluated before any of the file operation are
+executed. Thus there is less chance of an error occurring while the <tt>fs</tt>
+ action executes.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;fs&gt;
+            &lt;delete path='[PATH]' skip-trash='[true/false]'/&gt;
+            ...
+            &lt;mkdir path='[PATH]'/&gt;
+            ...
+            &lt;move source='[SOURCE-PATH]' target='[TARGET-PATH]'/&gt;
+            ...
+            &lt;chmod path='[PATH]' permissions='[PERMISSIONS]' 
dir-files='false' /&gt;
+            ...
+            &lt;touchz path='[PATH]' /&gt;
+            ...
+            &lt;chgrp path='[PATH]' group='[GROUP]' dir-files='false' /&gt;
+            ...
+            &lt;setrep path='[PATH]' replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>delete</tt>
+ command deletes the specified path, if it is a directory it deletes 
recursively all its content and then
+deletes the directory. By default it does skip trash. It can be moved to trash 
by setting the value of skip-trash to
+'false'. It can also be used to drop hcat tables/partitions. This is the only 
FS command which supports HCatalog URIs as well.
+For eg:
+<pre>
+&lt;delete path='hcat://[metastore server]:[port]/[database name]/[table 
name]'/&gt;
+OR
+&lt;delete path='hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value];...'/&gt;
+</pre></p>
+<p>The <tt>mkdir</tt>
+ command creates the specified directory, it creates all missing directories 
in the path. If the directory
+already exist it does a no-op.</p>
+<p>In the <tt>move</tt>
+ command the <tt>source</tt>
+ path must exist. The following scenarios are addressed for a <tt>move</tt>
+:</p>
+<p><ul><li>The file system URI(e.g. <a 
href="./hdfs://{nameNode}).html">hdfs://{nameNode})</a>
+ can be skipped in the <tt>target</tt>
+ path. It is understood to be the same as that of the source. But if the 
target path does contain the system URI, it cannot be different than that of 
the source.</li>
+<li>The parent directory of the <tt>target</tt>
+ path must exist</li>
+<li>For the <tt>target</tt>
+ path, if it is a file, then it must not already exist.</li>
+<li>However, if the <tt>target</tt>
+ path is an already existing directory, the <tt>move</tt>
+ action will place your <tt>source</tt>
+ as a child of the <tt>target</tt>
+ directory.</li>
+</ul>
+</p>
+<p>The <tt>chmod</tt>
+ command changes the permissions for the specified path. Permissions can be 
specified using the Unix Symbolic
+representation (e.g. -rwxrw-rw-) or an octal representation (755).
+When doing a <tt>chmod</tt>
+ command on a directory, by default the command is applied to the directory 
and the files one level
+within the directory. To apply the <tt>chmod</tt>
+ command to the directory, without affecting the files within it,
+the <tt>dir-files</tt>
+ attribute must be set to <tt>false</tt>
+. To apply the <tt>chmod</tt>
+ command 
+recursively to all levels within a directory, put a <tt>recursive</tt>
+ element inside the <chmod> element.</p>
+<p>The <tt>touchz</tt>
+ command creates a zero length file in the specified path if none exists. If 
one already exists, then touchz will perform a touch operation.
+Touchz works only for absolute paths.</p>
+<p>The <tt>chgrp</tt>
+ command changes the group for the specified path.
+When doing a <tt>chgrp</tt>
+ command on a directory, by default the command is applied to the directory 
and the files one level
+within the directory. To apply the <tt>chgrp</tt>
+ command to the directory, without affecting the files within it,
+the <tt>dir-files</tt>
+ attribute must be set to <tt>false</tt>
+.
+To apply the <tt>chgrp</tt>
+ command recursively to all levels within a directory, put a <tt>recursive</tt>
+ element inside the <chgrp> element.</p>
+<p>The <tt>setrep</tt>
+ command changes replication factor of an hdfs file(s). Changing RF of 
directories or symlinks is not
+supported; this action requires an argument for RF.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+         &lt;fs&gt;
+            &lt;delete path='hdfs://foo:8020/usr/tucu/temp-data'/&gt;
+            &lt;mkdir path='archives/${wf:id()}'/&gt;
+            &lt;move source='${jobInput}' 
target='archives/${wf:id()}/processed-input'/&gt;
+            &lt;chmod path='${jobOutput}' permissions='-rwxrw-rw-' 
dir-files='true'&gt;&lt;recursive/&gt;&lt;/chmod&gt;
+            &lt;chgrp path='${jobOutput}' group='testgroup' 
dir-files='true'&gt;&lt;recursive/&gt;&lt;/chgrp&gt;
+            &lt;setrep path='archives/${wf:id()/filename(s)}' 
replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>In the above example, a directory named after the workflow job ID is 
created and the input of the job, passed as
+workflow configuration parameter, is archived under the previously created 
directory.</p>
+<p>As of schema 0.4, if a <tt>name-node</tt>
+ element is specified, then it is not necessary for any of the paths to start 
with the file system
+URI as it is taken from the <tt>name-node</tt>
+ element. This is also true if the name-node is specified in the global 
section 
+(see <a href="./WorkflowFunctionalSpec.html#GlobalConfigurations">Global 
Configurations</a>
+)</p>
+<p>As of schema 0.4, zero or more <tt>job-xml</tt>
+ elements can be specified; these must refer to Hadoop JobConf <tt>job.xml</tt>
+ formatted files
+bundled in the workflow application. They can be used to set additional 
properties for the FileSystem instance.</p>
+<p>As of schema 0.4, if a <tt>configuration</tt>
+ element is specified, then it will also be used to set additional JobConf 
properties for the 
+FileSystem instance. Properties specified in the <tt>configuration</tt>
+ element override properties specified in the files specified 
+by any <tt>job-xml</tt>
+ elements.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:0.4&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+        &lt;fs&gt;
+           &lt;name-node&gt;hdfs://foo:8020&lt;/name-node&gt;
+           &lt;job-xml&gt;fs-info.xml&lt;/job-xml&gt;
+           &lt;configuration&gt;
+             &lt;property&gt;
+               &lt;name&gt;some.property&lt;/name&gt;
+               &lt;value&gt;some.value&lt;/value&gt;
+             &lt;/property&gt;
+           &lt;/configuration&gt;
+           &lt;delete path='/usr/tucu/temp-data'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p><a name="SubWorkflowAction"></a>
+</p>
+<a name="a3.2.5_Sub-workflow_Action"></a>
+</div>
+<div class="section"><h5>3.2.5 Sub-workflow Action</h5>
+<p>The <tt>sub-workflow</tt>
+ action runs a child workflow job, the child workflow job can be in the same 
Oozie system or in
+another Oozie system.</p>
+<p>The parent workflow job will wait until the child workflow job has 
completed.</p>
+<p>There can be several sub-workflows defined within a single workflow, each 
under its own action element.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;[WF-APPLICATION-PATH]&lt;/app-path&gt;
+            &lt;propagate-configuration/&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The child workflow job runs in the same Oozie system instance where the 
parent workflow job is running.</p>
+<p>The <tt>app-path</tt>
+ element specifies the path to the workflow application of the child workflow 
job.</p>
+<p>The <tt>propagate-configuration</tt>
+ flag, if present, indicates that the workflow job configuration should be 
propagated to
+the child workflow.</p>
+<p>The <tt>configuration</tt>
+ section can be used to specify the job properties that are required to run 
the child workflow job.</p>
+<p>The configuration of the <tt>sub-workflow</tt>
+ action can be parameterized (templatized) using EL expressions.</p>
+<p><b>Example:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;sample-wf&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;a&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;child-wf&lt;/app-path&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;input.dir&lt;/name&gt;
+                    &lt;value&gt;${wf:id()}/second-mr-output&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>In the above example, the workflow definition with the name 
<tt>child-wf</tt>
+ will be run on the Oozie instance at
+ <tt>.http://myhost:11000/oozie</tt>
+. The specified workflow application must be already deployed on the target 
Oozie instance.</p>
+<p>A configuration parameter <tt>input.dir</tt>
+ is being passed as job property to the child workflow job.</p>
+<p>The subworkflow can inherit the lib jars from the parent workflow by 
setting <tt>oozie.subworkflow.classpath.inheritance</tt>
+ to true
+in oozie-site.xml or on a per-job basis by setting 
<tt>oozie.wf.subworkflow.classpath.inheritance</tt>
+ to true in a job.properties file.
+If both are specified, <tt>oozie.wf.subworkflow.classpath.inheritance</tt>
+ has priority.  If the subworkflow and the parent have
+conflicting jars, the subworkflow's jar has priority.  By default, 
<tt>oozie.wf.subworkflow.classpath.inheritance</tt>
+ is set to false.</p>
+<p>To prevent errant workflows from starting infinitely recursive 
subworkflows, <tt>oozie.action.subworkflow.max.depth</tt>
+ can be specified
+in oozie-site.xml to set the maximum depth of subworkflow calls.  For example, 
if set to 3, then a workflow can start subwf1, which
+can start subwf2, which can start subwf3; but if subwf3 tries to start subwf4, 
then the action will fail.  The default is 50.</p>
+<p><a name="JavaAction"></a>
+</p>
+<a name="a3.2.6_Java_Action"></a>
+</div>
+<div class="section"><h5>3.2.6 Java Action</h5>
+<p>The <tt>java</tt>
+ action will execute the <tt>public static void main(String[] args)</tt>
+ method of the specified main Java class.</p>
+<p>Java applications are executed in the Hadoop cluster as map-reduce job with 
a single Mapper task.</p>
+<p>The workflow job will wait until the java application completes its 
execution before continuing to the next action.</p>
+<p>The <tt>java</tt>
+ action has to be configured with the resource-manager, name-node, main Java 
class, JVM options and arguments.</p>
+<p>To indicate an <tt>ok</tt>
+ action transition, the main Java class must complete gracefully the 
<tt>main</tt>
+ method invocation.</p>
+<p>To indicate an <tt>error</tt>
+ action transition, the main Java class must throw an exception.</p>
+<p>The main Java class can call <tt>System.exit(int n)</tt>
+. Exit code zero is regarded as OK, while non-zero exit codes will 
+cause the <tt>java</tt>
+ action to do an <tt>error</tt>
+ transition and exit.</p>
+<p>A <tt>java</tt>
+ action can be configured to perform HDFS files/directories cleanup or 
HCatalog partitions cleanup before
+starting the Java application. This capability enables Oozie to retry a Java 
application in the situation of a transient
+or non-transient failure (This can be used to cleanup any temporary data which 
may have been created by the Java
+application in case of failure).</p>
+<p>A <tt>java</tt>
+ action can create a Hadoop configuration for interacting with a cluster (e.g. 
launching a map-reduce job).
+Oozie prepares a Hadoop configuration file which includes the environments 
site configuration files (e.g. hdfs-site.xml, 
+mapred-site.xml, etc) plus the properties added to the <tt><configuration></tt>
+ section of the <tt>java</tt>
+ action. The Hadoop configuration 
+file is made available as a local file to the Java application in its running 
directory. It can be added to the <tt>java</tt>
+ actions 
+Hadoop configuration by referencing the system property: 
<tt>oozie.action.conf.xml</tt>
+. For example:</p>
+<p><pre>
+// loading action conf prepared by Oozie
+Configuration actionConf = new Configuration(false);
+actionConf.addResource(new Path(&quot;file:///&quot;, 
System.getProperty(&quot;oozie.action.conf.xml&quot;)));
+</pre></p>
+<p>If <tt>oozie.action.conf.xml</tt>
+ is not added then the job will pick up the mapred-default properties and this 
may result
+in unexpected behaviour. For repeated configuration properties later values 
override earlier ones.</p>
+<p>Inline property values can be parameterized (templatized) using EL 
expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt>
+ (=resource-manager=) and HDFS <tt>fs.default.name</tt>
+ (=name-node=) properties must not be present
+in the <tt>job-xml</tt>
+ and in the inline configuration.</p>
+<p>As with <tt>map-reduce</tt>
+ and <tt>pig</tt>
+ actions, it  is possible to add files and archives to be available to the Java
+application. Refer to section [#FilesArchives][Adding Files and Archives for 
the Job].</p>
+<p>The <tt>capture-output</tt>
+ element can be used to propagate values back into Oozie context, which can 
then be accessed via
+EL-functions. This needs to be written out as a java properties format file. 
The filename is obtained via a System 
+property specified by the constant <tt>oozie.action.output.properties</tt>
+</p>
+<p><b>IMPORTANT:</b>
+ In order for a Java action to succeed on a secure cluster, it must propagate 
the Hadoop delegation token like in the
+following code snippet (this is benign on non-secure clusters):
+<pre>
+// propagate delegation related props from launcher job to MR job
+if (System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;) != null) {
+    jobConf.set(&quot;mapreduce.job.credentials.binary&quot;, 
System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;));
+}
+</pre></p>
+<p><b>IMPORTANT:</b>
+ Because the Java application is run from within a Map-Reduce job, from Hadoop 
0.20. onwards a queue must
+be assigned to it. The queue name must be specified as a configuration 
property.</p>
+<p><b>IMPORTANT:</b>
+ The Java application from a Java action is executed in a single map task.  If 
the task is abnormally terminated,
+such as due to a TaskTracker restart (e.g. during cluster maintenance), the 
task will be retried via the normal Hadoop task
+retry mechanism.  To avoid workflow failure, the application should be written 
in a fashion that is resilient to such retries,
+for example by detecting and deleting incomplete outputs or picking back up 
from complete outputs.  Furthermore, if a Java action
+spawns asynchronous activity outside the JVM of the action itself (such as by 
launching additional MapReduce jobs), the
+application must consider the possibility of collisions with activity spawned 
by the new instance.</p>
+<p><b>Syntax:</b>
+</p>
+<p><pre>
+&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; 
xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;java&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;main-class&gt;[MAIN-CLASS]&lt;/main-class&gt;
+                       &lt;java-opts&gt;[JAVA-STARTUP-OPTS]&lt;/java-opts&gt;
+                       &lt;arg&gt;ARGUMENT&lt;/arg&gt;
+            ...
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+            &lt;capture-output /&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></p>
+<p>The <tt>prepare</tt>
+ element, if present, indicates a list of paths to delete before starting the 
Java application. This should
+be used exclusively for directory cleanup or dropping of hcatalog table or 
table partitions for the Java application to be executed.
+In case of <tt>delete</tt>
+, a glob pattern can be used to specify path.
+The format for specifying hcatalog table URI is
+hcat://[metastore server]:[port]/[database name]/[table name] and format to 
specify a hcatalog table partition URI is
+hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value].
+In case of a hcatalog URI, the hive-site.xml needs to be shipped using 
<tt>file</tt>
+ tag and the hcatalog and hive jars
+need to be placed in workflow lib directory or specified using <tt>archive</tt>
+ tag.</p>
+<p>The <tt>java-opts</tt>
+ and <tt>java-opt</tt>
+ elements, if present, contains the command line parameters which are to be 
used to start the JVM that
+will execute the Java application. Using this element is equivalent to using 
the <tt>mapred.child.java.opts</tt>
+
+or <tt>mapreduce.map.java.opts</tt>

[... 3669 lines stripped ...]

Reply via email to