Added: 
websites/staging/oozie/trunk/content/docs/5.2.1/CoordinatorFunctionalSpec.html
==============================================================================
--- 
websites/staging/oozie/trunk/content/docs/5.2.1/CoordinatorFunctionalSpec.html 
(added)
+++ 
websites/staging/oozie/trunk/content/docs/5.2.1/CoordinatorFunctionalSpec.html 
Fri Feb 26 14:22:45 2021
@@ -0,0 +1,5284 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2021-02-26 
+ | Rendered using Apache Maven Fluido Skin 1.4
+-->
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20210226" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Oozie &#x2013; </title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" 
src="./js/apache-maven-fluido-1.4.min.js"></script>
+
+    
+                  </head>
+        <body class="topBarDisabled">
+          
+        
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="https://oozie.apache.org/"; 
id="bannerLeft">
+                                                                               
         <img src="https://oozie.apache.org/images/oozie_200x.png";  
alt="Oozie"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org/"; class="externalLink" 
title="Apache">
+        Apache</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../../" title="Oozie">
+        Oozie</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../" title="docs">
+        docs</a>
+                    <span class="divider">/</span>
+      </li>
+                <li class="">
+                    <a href="./" title="5.2.1">
+        5.2.1</a>
+                    <span class="divider">/</span>
+      </li>
+        <li class="active "></li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right"><span 
class="divider">|</span> Last Published: 2021-02-26</li>
+              <li id="projectVersion" class="pull-right">
+                    Version: 5.2.1
+        </li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span2">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+  </ul>
+                
+                    
+                
+          <hr />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/"; title="Built 
by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" 
src="./images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span10" >
+                                  
+            <p><a href="index.html">::Go back to Oozie Documentation 
Index::</a></p><hr />
+<h1>Oozie Coordinator Specification</h1>
+<p>The goal of this document is to define a coordinator engine system 
specialized in submitting workflows based on time and data triggers.</p>
+<ul>
+<li><a href="#Changelog">Changelog</a></li>
+<li><a href="#a1._Coordinator_Overview">1. Coordinator Overview</a></li>
+<li><a href="#a2._Definitions">2. Definitions</a></li>
+<li><a href="#a3._Expression_Language_for_Parameterization">3. Expression 
Language for Parameterization</a></li>
+<li><a href="#a4._Datetime_Frequency_and_Time-Period_Representation">4. 
Datetime, Frequency and Time-Period Representation</a>
+<ul>
+<li><a href="#a4.1._Datetime">4.1. Datetime</a>
+<ul>
+<li><a href="#a4.1.1_End_of_the_day_in_Datetime_Values">4.1.1 End of the day 
in Datetime Values</a></li></ul></li>
+<li><a href="#a4.2._Timezone_Representation">4.2. Timezone 
Representation</a></li>
+<li><a href="#a4.3._Timezones_and_Daylight-Saving">4.3. Timezones and 
Daylight-Saving</a></li>
+<li><a href="#a4.4._Frequency_and_Time-Period_Representation">4.4. Frequency 
and Time-Period Representation</a>
+<ul>
+<li><a 
href="#a4.4.1._The_coord:daysint_n_and_coord:endOfDaysint_n_EL_functions">4.4.1.
 The coord:days(int n) and coord:endOfDays(int n) EL functions</a>
+<ul>
+<li><a href="#a4.4.1.1._The_coord:daysint_n_EL_function">4.4.1.1. The 
coord:days(int n) EL function</a></li>
+<li><a href="#a4.4.1.2._The_coord:endOfDaysint_n_EL_function">4.4.1.2. The 
coord:endOfDays(int n) EL function</a></li></ul></li>
+<li><a 
href="#a4.4.2._The_coord:monthsint_n_and_coord:endOfMonthsint_n_EL_functions">4.4.2.
 The coord:months(int n) and coord:endOfMonths(int n) EL functions</a>
+<ul>
+<li><a href="#a4.4.2.1._The_coord:monthsint_n_EL_function">4.4.2.1. The 
coord:months(int n) EL function</a></li>
+<li><a href="#a4.4.2.2._The_coord:endOfMonthsint_n_EL_function">4.4.2.2. The 
coord:endOfMonths(int n) EL function</a></li></ul></li>
+<li><a href="#a4.4.3._The_coord:endOfWeeksint_n_EL_function">4.4.3. The 
coord:endOfWeeks(int n) EL function</a></li>
+<li><a href="#a4.4.4._Cron_syntax_in_coordinator_frequency">4.4.4. Cron syntax 
in coordinator frequency</a></li></ul></li></ul></li>
+<li><a href="#a5._Dataset">5. Dataset</a>
+<ul>
+<li><a href="#a5.1._Synchronous_Datasets">5.1. Synchronous Datasets</a></li>
+<li><a href="#a5.2._Dataset_URI-Template_types">5.2. Dataset URI-Template 
types</a></li>
+<li><a href="#a5.3._Asynchronous_Datasets">5.3. Asynchronous Datasets</a></li>
+<li><a href="#a5.4._Dataset_Definitions">5.4. Dataset 
Definitions</a></li></ul></li>
+<li><a href="#a6._Coordinator_Application">6. Coordinator Application</a>
+<ul>
+<li><a href="#a6.1._Concepts">6.1. Concepts</a>
+<ul>
+<li><a href="#a6.1.1._Coordinator_Application">6.1.1. Coordinator 
Application</a></li>
+<li><a href="#a6.1.2._Coordinator_Job">6.1.2. Coordinator Job</a></li>
+<li><a href="#a6.1.3._Coordinator_Action">6.1.3. Coordinator Action</a>
+<ul>
+<li><a href="#a6.1.3.1._Coordinator_Action_Creation_Materialization">6.1.3.1. 
Coordinator Action Creation (Materialization)</a></li>
+<li><a href="#a6.1.3.2._Coordinator_Action_Status">6.1.3.2. Coordinator Action 
Status</a></li></ul></li>
+<li><a href="#a6.1.4._Input_Events">6.1.4. Input Events</a></li>
+<li><a href="#a6.1.5._Output_Events">6.1.5. Output Events</a></li>
+<li><a href="#a6.1.6._Coordinator_Action_Execution_Policies">6.1.6. 
Coordinator Action Execution Policies</a></li>
+<li><a href="#a6.1.7._Data_Pipeline_Application">6.1.7. Data Pipeline 
Application</a></li></ul></li>
+<li><a href="#a6.2._Synchronous_Coordinator_Application_Example">6.2. 
Synchronous Coordinator Application Example</a></li>
+<li><a href="#a6.3._Synchronous_Coordinator_Application_Definition">6.3. 
Synchronous Coordinator Application Definition</a></li>
+<li><a href="#a6.4._Asynchronous_Coordinator_Application_Definition">6.4. 
Asynchronous Coordinator Application Definition</a></li>
+<li><a href="#a6.5._Parameterization_of_Coordinator_Applications">6.5. 
Parameterization of Coordinator Applications</a></li>
+<li><a 
href="#a6.6._Parameterization_of_Dataset_Instances_in_Input_and_Output_Events">6.6.
 Parameterization of Dataset Instances in Input and Output Events</a>
+<ul>
+<li><a 
href="#a6.6.1._coord:currentint_n_EL_Function_for_Synchronous_Datasets">6.6.1. 
coord:current(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.2._coord:offsetint_n_String_timeUnit_EL_Function_for_Synchronous_Datasets">6.6.2.
 coord:offset(int n, String timeUnit) EL Function for Synchronous 
Datasets</a></li>
+<li><a 
href="#a6.6.3._coord:hoursInDayint_n_EL_Function_for_Synchronous_Datasets">6.6.3.
 coord:hoursInDay(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.4._coord:daysInMonthint_n_EL_Function_for_Synchronous_Datasets">6.6.4.
 coord:daysInMonth(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.5._coord:tzOffset_EL_Function_for_Synchronous_Datasets">6.6.5. 
coord:tzOffset() EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.6._coord:latestint_n_EL_Function_for_Synchronous_Datasets">6.6.6. 
coord:latest(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.7._coord:futureint_n_int_limit_EL_Function_for_Synchronous_Datasets">6.6.7.
 coord:future(int n, int limit) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.8._coord:absoluteString_timeStamp_EL_Function_for_Synchronous_Datasets">6.6.8.
 coord:absolute(String timeStamp) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.9._coord:endOfMonthsint_n_EL_Function_for_Synchronous_Datasets">6.6.9.
 coord:endOfMonths(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.10._coord:endOfWeeksint_n_EL_Function_for_Synchronous_Datasets">6.6.10.
 coord:endOfWeeks(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.11._coord:endOfDaysint_n_EL_Function_for_Synchronous_Datasets">6.6.11.
 coord:endOfDays(int n) EL Function for Synchronous Datasets</a></li>
+<li><a 
href="#a6.6.12._coord:versionint_n_EL_Function_for_Asynchronous_Datasets">6.6.12.
 coord:version(int n) EL Function for Asynchronous Datasets</a></li>
+<li><a 
href="#a6.6.13._coord:latestint_n_EL_Function_for_Asynchronous_Datasets">6.6.13.
 coord:latest(int n) EL Function for Asynchronous Datasets</a></li>
+<li><a 
href="#a6.6.14._Dataset_Instance_Resolution_for_Instances_Before_the_Initial_Instance">6.6.14.
 Dataset Instance Resolution for Instances Before the Initial 
Instance</a></li></ul></li>
+<li><a href="#a6.7._Parameterization_of_Coordinator_Application_Actions">6.7. 
Parameterization of Coordinator Application Actions</a>
+<ul>
+<li><a href="#a6.7.1._coord:dataInString_name_EL_Function">6.7.1. 
coord:dataIn(String name) EL Function</a></li>
+<li><a href="#a6.7.2._coord:dataOutString_name_EL_Function">6.7.2. 
coord:dataOut(String name) EL Function</a></li>
+<li><a href="#a6.7.3._coord:nominalTime_EL_Function">6.7.3. 
coord:nominalTime() EL Function</a></li>
+<li><a href="#a6.7.4._coord:actualTime_EL_Function">6.7.4. coord:actualTime() 
EL Function</a></li>
+<li><a href="#a6.7.5._coord:user_EL_Function_since_Oozie_2.3">6.7.5. 
coord:user() EL Function (since Oozie 2.3)</a></li></ul></li>
+<li><a 
href="#a6.8_Using_HCatalog_data_instances_in_Coordinator_Applications_since_Oozie_4.x">6.8
 Using HCatalog data instances in Coordinator Applications (since Oozie 4.x)</a>
+<ul>
+<li><a 
href="#a6.8.1_coord:databaseInString_name_coord:databaseOutString_name_EL_function">6.8.1
 coord:databaseIn(String name), coord:databaseOut(String name) EL 
function</a></li>
+<li><a 
href="#a6.8.2_coord:tableInString_name_coord:tableOutString_name_EL_function">6.8.2
 coord:tableIn(String name), coord:tableOut(String name) EL function</a></li>
+<li><a 
href="#a6.8.3_coord:dataInPartitionFilterString_name_String_type_EL_function">6.8.3
 coord:dataInPartitionFilter(String name, String type) EL function</a></li>
+<li><a href="#a6.8.4_coord:dataOutPartitionsString_name_EL_function">6.8.4 
coord:dataOutPartitions(String name) EL function</a></li>
+<li><a 
href="#a6.8.5_coord:dataInPartitionMinString_name_String_partition_EL_function">6.8.5
 coord:dataInPartitionMin(String name, String partition) EL function</a></li>
+<li><a 
href="#a6.8.6_coord:dataInPartitionMaxString_name_String_partition_EL_function">6.8.6
 coord:dataInPartitionMax(String name, String partition) EL function</a></li>
+<li><a 
href="#a6.8.7_coord:dataOutPartitionValueString_name_String_partition_EL_function">6.8.7
 coord:dataOutPartitionValue(String name, String partition) EL function</a></li>
+<li><a 
href="#a6.8.8_coord:dataInPartitionsString_name_String_type_EL_function">6.8.8 
coord:dataInPartitions(String name, String type) EL function</a></li></ul></li>
+<li><a href="#a6.9._Parameterization_of_Coordinator_Application">6.9. 
Parameterization of Coordinator Application</a>
+<ul>
+<li><a 
href="#a6.9.1._coord:dateOffsetString_baseDate_int_instance_String_timeUnit_EL_Function">6.9.1.
 coord:dateOffset(String baseDate, int instance, String timeUnit) EL 
Function</a></li>
+<li><a 
href="#a6.9.2._coord:dateTzOffsetString_baseDate_String_timezone_EL_Function">6.9.2.
 coord:dateTzOffset(String baseDate, String timezone) EL Function</a></li>
+<li><a 
href="#a6.9.3._coord:formatTimeString_ts_String_format_EL_Function_since_Oozie_2.3.2">6.9.3.
 coord:formatTime(String ts, String format) EL Function (since Oozie 
2.3.2)</a></li>
+<li><a 
href="#a6.9.4._coord:epochTimeString_ts_String_millis_EL_Function_since_Oozie_4.3">6.9.4.
 coord:epochTime(String ts, String millis) EL Function (since Oozie 
4.3)</a></li></ul></li>
+<li><a href="#a6.10._Conditional_coordinator_input_logic">6.10. Conditional 
coordinator input logic</a></li></ul></li>
+<li><a href="#a7._Handling_Timezones_and_Daylight_Saving_Time">7. Handling 
Timezones and Daylight Saving Time</a>
+<ul>
+<li><a href="#a7.1._Handling_Timezones_with_No_Day_Light_Saving_Time">7.1. 
Handling Timezones with No Day Light Saving Time</a></li>
+<li><a href="#a7.2._Handling_Timezones_with_Daylight_Saving_Time">7.2. 
Handling Timezones with Daylight Saving Time</a></li>
+<li><a href="#a7.3._Timezone_and_Daylight_Saving_Tools">7.3. Timezone and 
Daylight Saving Tools</a></li></ul></li>
+<li><a href="#a8._Operational_Considerations">8. Operational Considerations</a>
+<ul>
+<li><a href="#a8.1._Reprocessing">8.1. Reprocessing</a></li></ul></li>
+<li><a href="#a9._User_Propagation">9. User Propagation</a></li>
+<li><a href="#a10._Coordinator_Application_Deployment">10. Coordinator 
Application Deployment</a>
+<ul>
+<li><a href="#a10.1._Organizing_Coordinator_Applications">10.1. Organizing 
Coordinator Applications</a>
+<ul>
+<li><a href="#a10.1.1._Dataset_Names_Collision_Resolution">10.1.1. Dataset 
Names Collision Resolution</a></li></ul></li></ul></li>
+<li><a href="#a11._Coordinator_Job_Submission">11. Coordinator Job 
Submission</a></li>
+<li><a href="#a12._SLA_Handling">12. SLA Handling</a>
+<ul>
+<li><a href="#Coordinator_SLA_Example">Coordinator SLA Example</a></li>
+<li><a href="#Workflow_SLA_Example">Workflow SLA Example</a></li></ul></li>
+<li><a href="#a13._Web_Services_API">13. Web Services API</a></li>
+<li><a href="#a14._Coordinator_Rerun">14. Coordinator Rerun</a>
+<ul>
+<li><a href="#Rerunning_a_Coordinator_Action_or_Multiple_Actions">Rerunning a 
Coordinator Action or Multiple Actions</a></li></ul></li>
+<li><a href="#a15._Coordinator_Notifications">15. Coordinator Notifications</a>
+<ul>
+<li><a href="#a15.1_Coordinator_Action_Status_Notification">15.1 Coordinator 
Action Status Notification</a></li></ul></li>
+<li><a href="#Appendixes">Appendixes</a>
+<ul>
+<li><a href="#Appendix_A_Oozie_Coordinator_XML-Schema">Appendix A, Oozie 
Coordinator XML-Schema</a>
+<ul>
+<li><a href="#Oozie_Coordinator_Schema_0.5">Oozie Coordinator Schema 
0.5</a></li>
+<li><a href="#Oozie_Coordinator_Schema_0.4">Oozie Coordinator Schema 
0.4</a></li>
+<li><a href="#Oozie_Coordinator_Schema_0.2">Oozie Coordinator Schema 
0.2</a></li>
+<li><a href="#Oozie_Coordinator_Schema_0.1">Oozie Coordinator Schema 
0.1</a></li>
+<li><a href="#Oozie_SLA_Schemas">Oozie SLA Schemas</a>
+<ul>
+<li><a href="#Oozie_SLA_Version_0.2">Oozie SLA Version 0.2</a></li>
+<li><a href="#Oozie_SLA_Version_0.1">Oozie SLA Version 
0.1</a></li></ul></li></ul></li></ul></li></ul>
+
+<div class="section">
+<h2><a name="Changelog"></a>Changelog</h2>
+<p><b>03/JUL/2013</b></p>
+<ul>
+
+<li>Appendix A, Added new coordinator schema 0.4, sla schema 0.2 and changed 
schemas ordering to newest first</li>
+</ul>
+<p><b>07/JAN/2013</b></p>
+<ul>
+
+<li>6.8 Added section on new EL functions for datasets defined with 
HCatalog</li>
+</ul>
+<p><b>26/JUL/2012</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema 0.4 to include <tt>parameters</tt> 
element</li>
+<li>6.5 Updated to mention about <tt>parameters</tt> element as of schema 
0.4</li>
+</ul>
+<p><b>23/NOV/2011:</b></p>
+<ul>
+
+<li>Update execution order typo</li>
+</ul>
+<p><b>05/MAY/2011:</b></p>
+<ul>
+
+<li>Update coordinator schema 0.2</li>
+</ul>
+<p><b>09/MAR/2011:</b></p>
+<ul>
+
+<li>Update coordinator status</li>
+</ul>
+<p><b>02/DEC/2010:</b></p>
+<ul>
+
+<li>Update coordinator done-flag</li>
+</ul>
+<p><b>26/AUG/2010:</b></p>
+<ul>
+
+<li>Update coordinator rerun</li>
+</ul>
+<p><b>09/JUN/2010:</b></p>
+<ul>
+
+<li>Clean up unsupported functions</li>
+</ul>
+<p><b>02/JUN/2010:</b></p>
+<ul>
+
+<li>Update all EL functions in CoordFunctionalSpec with &#x201c;coord:&#x201d; 
prefix</li>
+</ul>
+<p><b>02/OCT/2009:</b></p>
+<ul>
+
+<li>Added Appendix A, Oozie Coordinator XML-Schema</li>
+<li>Change #5.3., Datasets definition supports &#x2018;include&#x2019; 
element</li>
+</ul>
+<p><b>29/SEP/2009:</b></p>
+<ul>
+
+<li>Change #4.4.1, added <tt>${coord:endOfDays(int n)}</tt> EL function</li>
+<li>Change #4.4.2, added <tt>${coord:endOfMonths(int n)}</tt> EL function</li>
+</ul>
+<p><b>11/SEP/2009:</b></p>
+<ul>
+
+<li>Change #6.6.4. <tt>${coord:tzOffset()}</tt> EL function now returns offset 
in minutes. Added more explanation on behavior</li>
+<li>Removed &#x2018;oozie&#x2019; URL from action workflow invocation, per 
arch review feedback coord&amp;wf run on the same instance</li>
+</ul>
+<p><b>07/SEP/2009:</b></p>
+<ul>
+
+<li>Full rewrite of sections #4 and #7</li>
+<li>Added sections #6.1.7, #6.6.2, #6.6.3 &amp; #6.6.4</li>
+<li>Rewording through the spec definitions</li>
+<li>Updated all examples and syntax to latest changes</li>
+</ul>
+<p><b>03/SEP/2009:</b></p>
+<ul>
+
+<li>Change #2. Definitions. Some rewording in the definitions</li>
+<li>Change #6.6.4. Replaced <tt>${coord:next(int n)}</tt> with 
<tt>${coord:version(int n)}</tt> EL Function</li>
+<li>Added #6.6.5. Dataset Instance Resolution for Instances Before the Initial 
Instance</li>
+</ul></div>
+<div class="section">
+<h2><a name="a1._Coordinator_Overview"></a>1. Coordinator Overview</h2>
+<p>Users typically run map-reduce, hadoop-streaming, hdfs and/or Pig jobs on 
the grid. Multiple of these jobs can be combined to form a workflow job. <a 
class="externalLink" 
href="https://issues.apache.org/jira/browse/HADOOP-5303";>Oozie, Hadoop Workflow 
System</a> defines a workflow system that runs such jobs.</p>
+<p>Commonly, workflow jobs are run based on regular time intervals and/or data 
availability. And, in some cases, they can be triggered by an external 
event.</p>
+<p>Expressing the condition(s) that trigger a workflow job can be modeled as a 
predicate that has to be satisfied. The workflow job is started after the 
predicate is satisfied. A predicate can reference to data, time and/or external 
events. In the future, the model can be extended to support additional event 
types.</p>
+<p>It is also necessary to connect workflow jobs that run regularly, but at 
different time intervals. The outputs of multiple subsequent runs of a workflow 
become the input to the next workflow. For example, the outputs of last 4 runs 
of a workflow that runs every 15 minutes become the input of another workflow 
that runs every 60 minutes. Chaining together these workflows result it is 
referred as a data application pipeline.</p>
+<p>The Oozie <b>Coordinator</b> system allows the user to define and execute 
recurrent and interdependent workflow jobs (data application pipelines).</p>
+<p>Real world data application pipelines have to account for reprocessing, 
late processing, catchup, partial processing, monitoring, notification and 
SLAs.</p>
+<p>This document defines the functional specification for the Oozie 
Coordinator system.</p></div>
+<div class="section">
+<h2><a name="a2._Definitions"></a>2. Definitions</h2>
+<p><b>Actual time:</b> The actual time indicates the time when something 
actually happens.</p>
+<p><b>Nominal time:</b> The nominal time specifies the time when something 
should happen. In theory the nominal time and the actual time should match, 
however, in practice due to delays the actual time may occur later than the 
nominal time.</p>
+<p><b>Dataset:</b> Collection of data referred to by a logical name. A dataset 
normally has several instances of data and each one of them can be referred 
individually. Each dataset instance is represented by a unique set of URIs.</p>
+<p><b>Synchronous Dataset:</b> Synchronous datasets instances are generated at 
fixed time intervals and there is a dataset instance associated with each time 
interval. Synchronous dataset instances are identified by their nominal time. 
For example, in the case of a HDFS based dataset, the nominal time would be 
somewhere in the file path of the dataset instance: 
<tt>hdfs://foo:8020/usr/logs/2009/04/15/23/30</tt>. In the case of HCatalog 
table partitions, the nominal time would be part of some partition values: 
<tt>hcat://bar:8020/mydb/mytable/year=2009;month=04;dt=15;region=us</tt>.</p>
+<p><b>Coordinator Action:</b> A coordinator action is a workflow job that is 
started when a set of conditions are met (input dataset instances are 
available).</p>
+<p><b>Coordinator Application:</b> A coordinator application defines the 
conditions under which coordinator actions should be created (the frequency) 
and when the actions can be started. The coordinator application also defines a 
start and an end time. Normally, coordinator applications are parameterized. A 
Coordinator application is written in XML.</p>
+<p><b>Coordinator Job:</b> A coordinator job is an executable instance of a 
coordination definition. A job submission is done by submitting a job 
configuration that resolves all parameters in the application definition.</p>
+<p><b>Data pipeline:</b> A data pipeline is a connected set of coordinator 
applications that consume and produce interdependent datasets.</p>
+<p><b>Coordinator Definition Language:</b> The language used to describe 
datasets and coordinator applications.</p>
+<p><b>Coordinator Engine:</b> A system that executes coordinator 
jobs.</p></div>
+<div class="section">
+<h2><a name="a3._Expression_Language_for_Parameterization"></a>3. Expression 
Language for Parameterization</h2>
+<p>Coordinator application definitions can be parameterized with variables, 
built-in constants and built-in functions.</p>
+<p>At execution time all the parameters are resolved into concrete values.</p>
+<p>The parameterization of workflow definitions it done using JSP Expression 
Language syntax from the <a class="externalLink" 
href="http://jcp.org/aboutJava/communityprocess/final/jsr152/index.html";>JSP 
2.0 Specification (JSP.2.3)</a>, allowing not only to support variables as 
parameters but also functions and complex expressions.</p>
+<p>EL expressions can be used in XML attribute values and XML text element 
values. They cannot be used in XML element and XML attribute names.</p>
+<p>Refer to section #6.5 &#x2018;Parameterization of Coordinator 
Applications&#x2019; for more details.</p></div>
+<div class="section">
+<h2><a name="a4._Datetime_Frequency_and_Time-Period_Representation"></a>4. 
Datetime, Frequency and Time-Period Representation</h2>
+<p>Oozie processes coordinator jobs in a fixed timezone with no DST (typically 
<tt>UTC</tt>), this timezone is referred as &#x2018;Oozie processing 
timezone&#x2019;.</p>
+<p>The Oozie processing timezone is used to resolve coordinator jobs start/end 
times, job pause times and the initial-instance of datasets. Also, all 
coordinator dataset instance URI templates are resolved to a datetime in the 
Oozie processing time-zone.</p>
+<p>All the datetimes used in coordinator applications and job parameters to 
coordinator applications must be specified in the Oozie processing timezone. If 
Oozie processing timezone is <tt>UTC</tt>, the qualifier is  <b>Z</b>. If Oozie 
processing time zone is other than <tt>UTC</tt>, the qualifier must be the GMT 
offset, <tt>(+/-)####</tt>.</p>
+<p>For example, a datetime in <tt>UTC</tt>  is <tt>2012-08-12T00:00Z</tt>, the 
same datetime in <tt>GMT+5:30</tt> is <tt>2012-08-12T05:30+0530</tt>.</p>
+<p>For simplicity, the rest of this specification uses <tt>UTC</tt> 
datetimes.</p>
+<p><a name="datetime"></a></p>
+<div class="section">
+<h3><a name="a4.1._Datetime"></a>4.1. Datetime</h3>
+<p>If the Oozie processing timezone is <tt>UTC</tt>, all datetime values are 
always in <a class="externalLink" 
href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time";>UTC</a> down to 
a minute precision, &#x2018;YYYY-MM-DDTHH:mmZ&#x2019;.</p>
+<p>For example <tt>2009-08-10T13:10Z</tt> is August 10th 2009 at 13:10 UTC.</p>
+<p>If the Oozie processing timezone is a GMT offset <tt>GMT(+/-)####</tt>, all 
datetime values are always in <a class="externalLink" 
href="http://en.wikipedia.org/wiki/ISO_8601";>ISO 8601</a> in the corresponding 
GMT offset down to a minute precision, 
&#x2018;YYYY-MM-DDTHH:mmGMT(+/-)####&#x2019;.</p>
+<p>For example <tt>2009-08-10T13:10+0530</tt> is August 10th 2009 at 13:10 
GMT+0530, India timezone.</p>
+<div class="section">
+<h4><a name="a4.1.1_End_of_the_day_in_Datetime_Values"></a>4.1.1 End of the 
day in Datetime Values</h4>
+<p>It is valid to express the end of day as a &#x2018;24:00&#x2019; hour (i.e. 
<tt>2009-08-10T24:00Z</tt>).</p>
+<p>However, for all calculations and display, Oozie resolves such dates as the 
zero hour of the following day (i.e. 
<tt>2009-08-11T00:00Z</tt>).</p></div></div>
+<div class="section">
+<h3><a name="a4.2._Timezone_Representation"></a>4.2. Timezone 
Representation</h3>
+<p>There is no widely accepted standard to identify timezones.</p>
+<p>Oozie Coordinator will understand the following timezone identifiers:</p>
+<ul>
+
+<li>Generic NON-DST timezone identifier: <tt>GMT[+/-]##:##</tt> (i.e.: 
GMT+05:30)</li>
+<li>UTC timezone identifier: <tt>UTC</tt> (i.e.: 2009-06-06T00:00Z)</li>
+<li>ZoneInfo identifiers, with DST support, understood by Java JDK (about 600 
IDs) (i.e.: America/Los_Angeles)</li>
+</ul>
+<p>Due to DST shift from PST to PDT, it is preferred that GMT, UTC or 
Region/City timezone notation is used in favor of direct three-letter ID (PST, 
PDT, BST, etc.). For example, America/Los_Angeles switches from PST to PDT at a 
DST shift. If used directly, PST will not handle DST shift when time is 
switched to PDT.</p>
+<p>Oozie Coordinator must provide a tool for developers to list all supported 
timezone identifiers.</p></div>
+<div class="section">
+<h3><a name="a4.3._Timezones_and_Daylight-Saving"></a>4.3. Timezones and 
Daylight-Saving</h3>
+<p>While Oozie coordinator engine works in a fixed timezone with no DST 
(typically <tt>UTC</tt>), it provides DST support for coordinator 
applications.</p>
+<p>The baseline datetime for datasets and coordinator applications are 
expressed in UTC. The baseline datetime is the time of the first occurrence.</p>
+<p>Datasets and coordinator applications also contain a timezone indicator.</p>
+<p>The use of UTC as baseline enables a simple way of mix and matching 
datasets and coordinator applications that use a different timezone by just 
adding the timezone offset.</p>
+<p>The timezone indicator enables Oozie coordinator engine to properly compute 
frequencies that are daylight-saving sensitive. For example: a daily frequency 
can be 23, 24 or 25 hours for timezones that observe daylight-saving. Weekly 
and monthly frequencies are also affected by this as the number of hours in the 
day may change.</p>
+<p>Section #7 &#x2018;Handling Timezones and Daylight Saving Time&#x2019; 
explains how coordinator applications can be written to handle timezones and 
daylight-saving-time properly.</p></div>
+<div class="section">
+<h3><a name="a4.4._Frequency_and_Time-Period_Representation"></a>4.4. 
Frequency and Time-Period Representation</h3>
+<p>Frequency is used to capture the periodic intervals at which datasets that 
are produced, and coordinator applications are scheduled to run.</p>
+<p>This time periods representation is also used to specify non-recurrent 
time-periods, for example a timeout interval.</p>
+<p>For datasets and coordinator applications the frequency time-period is 
applied <tt>N</tt> times to the baseline datetime to compute recurrent 
times.</p>
+<p>Frequency is always expressed in minutes.</p>
+<p>Because the number of minutes in day may vary for timezones that observe 
daylight saving time, constants cannot be use to express frequencies greater 
than a day for datasets and coordinator applications for such timezones. For 
such uses cases, Oozie coordinator provides 2 EL functions, 
<tt>${coord:days(int n)}</tt> and <tt>${coord:months(int n)}</tt>.</p>
+<p>Frequencies can be expressed using EL constants and EL functions that 
evaluate to an positive integer number.</p>
+<p>Coordinator Frequencies can also be expressed using cron syntax.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>EL Constant</b> </th>
+<th> <b>Value</b> </th>
+<th> <b>Example</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>${coord:minutes(int n)}</tt> </td>
+<td> <i>n</i> </td>
+<td> <tt>${coord:minutes(45)}</tt> &#x2013;&gt; <tt>45</tt> </td></tr>
+<tr class="a">
+<td> <tt>${coord:hours(int n)}</tt> </td>
+<td> <i>n * 60</i> </td>
+<td> <tt>${coord:hours(3)}</tt> &#x2013;&gt; <tt>180</tt> </td></tr>
+<tr class="b">
+<td> <tt>${coord:days(int n)}</tt> </td>
+<td> <i>variable</i> </td>
+<td> <tt>${coord:days(2)}</tt> &#x2013;&gt; minutes in 2 full days from the 
current date </td></tr>
+<tr class="a">
+<td> <tt>${coord:months(int n)}</tt> </td>
+<td> <i>variable</i> </td>
+<td> <tt>${coord:months(1)}</tt> &#x2013;&gt; minutes in a 1 full month from 
the current date </td></tr>
+<tr class="b">
+<td> <tt>${cron syntax}</tt> </td>
+<td> <i>variable</i> </td>
+<td> <tt>${0,10 15 * * 2-6}</tt> &#x2013;&gt; a job that runs every weekday at 
3:00pm and 3:10pm UTC time</td></tr>
+</tbody>
+</table>
+<p>Note that, though <tt>${coord:days(int n)}</tt> and <tt>${coord:months(int 
n)}</tt> EL functions are used to calculate minutes precisely including 
variations due to daylight saving time for Frequency representation, when 
specified for coordinator timeout interval, one day is calculated as 24 hours 
and one month is calculated as 30 days for simplicity.</p>
+<div class="section">
+<h4><a 
name="a4.4.1._The_coord:daysint_n_and_coord:endOfDaysint_n_EL_functions"></a>4.4.1.
 The coord:days(int n) and coord:endOfDays(int n) EL functions</h4>
+<p>The <tt>${coord:days(int n)}</tt> and <tt>${coord:endOfDays(int n)}</tt> EL 
functions should be used to handle day based frequencies.</p>
+<p>Constant values should not be used to indicate a day based frequency (every 
1 day, every 1 week, etc) because the number of hours in every day is not 
always the same for timezones that observe daylight-saving time.</p>
+<p>It is a good practice to use always these EL functions instead of using a 
constant expression (i.e. <tt>24 * 60</tt>) even if the timezone for which the 
application is being written for does not support daylight saving time. This 
makes application foolproof to country legislation changes and also makes 
applications portable across timezones.</p>
+<div class="section">
+<h5><a name="a4.4.1.1._The_coord:daysint_n_EL_function"></a>4.4.1.1. The 
coord:days(int n) EL function</h5>
+<p>The <tt>${coord:days(int n)}</tt> EL function returns the number of minutes 
for &#x2018;n&#x2019; complete days starting with the day of the specified 
nominal time for which the computation is being done.</p>
+<p>The <tt>${coord:days(int n)}</tt> EL function includes <b>all</b> the 
minutes of the current day, regardless of the time of the day of the current 
nominal time.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Starting Nominal UTC time</b> </th>
+<th> <b>Timezone</b> </th>
+<th> <b>Usage</b>  </th>
+<th> <b>Value</b> </th>
+<th> <b>First Occurrence</b> </th>
+<th> <b>Comments</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> total minutes on 2009JAN01 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> total minutes in 2009JAN01 PST8PDT time </td></tr>
+<tr class="b">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:days(2)}</tt> </td>
+<td> 2880 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> total minutes in 2009JAN01 and 2009JAN02 PST8PDT time </td></tr>
+<tr class="a">
+<td colspan="5"> </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes on 2009MAR08 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>Europe/London</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR08 BST1BDT time </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1380 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR08 PST8PDT time <br /> (2009MAR08 is DST switch 
in the US) </td></tr>
+<tr class="a">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:days(2)}</tt> </td>
+<td> 2880 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR08 and 2009MAR09 UTC time </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:days(2)}</tt> </td>
+<td> 2820 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR08 and 2009MAR09 PST8PDT time <br /> (2009MAR08 
is DST switch in the US) </td></tr>
+<tr class="a">
+<td> <tt>2009-03-09T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:days(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-03-09T07:00Z</tt> </td>
+<td> total minutes in 2009MAR09 PST8PDT time <br /> (2009MAR08 is DST ON, 
frequency tick is earlier in UTC) </td></tr>
+</tbody>
+</table>
+<p>For all these examples, the first occurrence of the frequency will be at 
<tt>08:00Z</tt> (UTC time).</p></div>
+<div class="section">
+<h5><a name="a4.4.1.2._The_coord:endOfDaysint_n_EL_function"></a>4.4.1.2. The 
coord:endOfDays(int n) EL function</h5>
+<p>The <tt>${coord:endOfDays(int n)}</tt> EL function is identical to the 
<tt>${coord:days(int n)}</tt> except that it shifts the first occurrence to the 
end of the day for the specified timezone before computing the interval in 
minutes.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Starting Nominal UTC time</b> </th>
+<th> <b>Timezone</b> </th>
+<th> <b>Usage</b>  </th>
+<th> <b>Value</b> </th>
+<th> <b>First Occurrence</b> </th>
+<th> <b>Comments</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-02T00:00Z</tt> </td>
+<td> first occurrence in 2009JAN02 00:00 UTC time, <br /> first occurrence 
shifted to the end of the UTC day </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-02T08:00Z</tt> </td>
+<td> first occurrence in 2009JAN02 08:00 UTC time, <br /> first occurrence 
shifted to the end of the PST8PDT day </td></tr>
+<tr class="b">
+<td> <tt>2009-01-01T08:01Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-02T08:00Z</tt> </td>
+<td> first occurrence in 2009JAN02 08:00 UTC time, <br /> first occurrence 
shifted to the end of the PST8PDT day </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T18:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-01-02T08:00Z</tt> </td>
+<td> first occurrence in 2009JAN02 08:00 UTC time, <br /> first occurrence 
shifted to the end of the PST8PDT day </td></tr>
+<tr class="b">
+<td colspan="5"> </td></tr>
+<tr class="a">
+<td> <tt>2009-03-07T09:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1380 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> first occurrence in 2009MAR08 08:00 UTC time <br /> first occurrence 
shifted to the end of the PST8PDT day </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T07:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> first occurrence in 2009MAR08 08:00 UTC time <br /> first occurrence 
shifted to the end of the PST8PDT day </td></tr>
+<tr class="a">
+<td> <tt>2009-03-09T07:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfDays(1)}</tt> </td>
+<td> 1440 </td>
+<td> <tt>2009-03-10T07:00Z</tt> </td>
+<td> first occurrence in 2009MAR10 07:00 UTC time <br /> (2009MAR08 is DST 
switch in the US), <br /> first occurrence shifted to the end of the PST8PDT 
day </td></tr>
+</tbody>
+</table>
+
+<div>
+<div>
+<pre class="source">&lt;coordinator-app name=&quot;hello-coord&quot; 
frequency=&quot;${coord:days(1)}&quot;
+                  start=&quot;2009-01-02T08:00Z&quot; 
end=&quot;2009-01-04T08:00Z&quot; timezone=&quot;America/Los_Angeles&quot;
+                 xmlns=&quot;uri:oozie:coordinator:0.5&quot;&gt;
+      &lt;controls&gt;
+        &lt;timeout&gt;10&lt;/timeout&gt;
+        &lt;concurrency&gt;${concurrency_level}&lt;/concurrency&gt;
+        &lt;execution&gt;${execution_order}&lt;/execution&gt;
+        &lt;throttle&gt;${materialization_throttle}&lt;/throttle&gt;
+      &lt;/controls&gt;
+
+      &lt;datasets&gt;
+       &lt;dataset name=&quot;din&quot; 
frequency=&quot;${coord:endOfDays(1)}&quot;
+                initial-instance=&quot;2009-01-02T08:00Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+       &lt;dataset name=&quot;dout&quot; 
frequency=&quot;${coord:minutes(30)}&quot;
+                initial-instance=&quot;2009-01-02T08:00Z&quot; 
timezone=&quot;UTC&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+      &lt;/datasets&gt;
+
+      &lt;input-events&gt;
+         &lt;data-in name=&quot;input&quot; dataset=&quot;din&quot;&gt;
+                               
&lt;instance&gt;${coord:current(0)}&lt;/instance&gt;
+         &lt;/data-in&gt;
+      &lt;/input-events&gt;
+
+      &lt;output-events&gt;
+         &lt;data-out name=&quot;output&quot; dataset=&quot;dout&quot;&gt;
+                               
&lt;instance&gt;${coord:current(1)}&lt;/instance&gt;
+         &lt;/data-out&gt;
+      &lt;/output-events&gt;
+
+      &lt;action&gt;
+        &lt;workflow&gt;
+          &lt;app-path&gt;${wf_app_path}&lt;/app-path&gt;
+          &lt;configuration&gt;
+              &lt;property&gt;
+              &lt;name&gt;wfInput&lt;/name&gt;
+              &lt;value&gt;${coord:dataIn('input')}&lt;/value&gt;
+            &lt;/property&gt;
+            &lt;property&gt;
+              &lt;name&gt;wfOutput&lt;/name&gt;
+              &lt;value&gt;${coord:dataOut('output')}&lt;/value&gt;
+            &lt;/property&gt;
+         &lt;/configuration&gt;
+       &lt;/workflow&gt;
+      &lt;/action&gt;
+ &lt;/coordinator-app&gt;
+</pre></div></div>
+</div></div>
+<div class="section">
+<h4><a 
name="a4.4.2._The_coord:monthsint_n_and_coord:endOfMonthsint_n_EL_functions"></a>4.4.2.
 The coord:months(int n) and coord:endOfMonths(int n) EL functions</h4>
+<p>The <tt>${coord:months(int n)}</tt> and <tt>${coord:endOfMonths(int 
n)}</tt> EL functions should be used to handle month based frequencies.</p>
+<p>Constant values cannot be used to indicate a month based frequency because 
the number of days in a month changes month to month and on leap years; plus 
the number of hours in every day of the month are not always the same for 
timezones that observe daylight-saving time.</p>
+<div class="section">
+<h5><a name="a4.4.2.1._The_coord:monthsint_n_EL_function"></a>4.4.2.1. The 
coord:months(int n) EL function</h5>
+<p>The <tt>${coord:months(int n)}</tt> EL function returns the number of 
minutes for &#x2018;n&#x2019; complete months starting with the month of the 
current nominal time for which the computation is being done.</p>
+<p>The <tt>${coord:months(int n)}</tt> EL function includes <b>all</b> the 
minutes of the current month, regardless of the day of the month of the current 
nominal time.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Starting Nominal UTC time</b> </th>
+<th> <b>Timezone</b> </th>
+<th> <b>Usage</b>  </th>
+<th> <b>Value</b> </th>
+<th> <b>First Occurrence</b> </th>
+<th> <b>Comments</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:months(1)}</tt> </td>
+<td> 44640 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td>total minutes for 2009JAN UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:months(1)}</tt> </td>
+<td> 44640 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> total minutes in 2009JAN PST8PDT time </td></tr>
+<tr class="b">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:months(2)}</tt> </td>
+<td> 84960 </td>
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> total minutes in 2009JAN and 2009FEB PST8PDT time </td></tr>
+<tr class="a">
+<td colspan="5"> </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:months(1)}</tt> </td>
+<td> 44640 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes on 2009MAR UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>Europe/London</tt> </td>
+<td> <tt>${coord:months(1)}</tt> </td>
+<td> 44580 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR BST1BDT time <br /> (2009MAR29 is DST switch in 
Europe) </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:months(1)}</tt> </td>
+<td> 44580 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR PST8PDT time <br /> (2009MAR08 is DST switch in 
the US) </td></tr>
+<tr class="a">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:months(2)}</tt> </td>
+<td> 87840 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR and 2009APR UTC time </td></tr>
+<tr class="b">
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:months(2)}</tt> </td>
+<td> 87780 </td>
+<td> <tt>2009-03-08T08:00Z</tt> </td>
+<td> total minutes in 2009MAR and 2009APR PST8PDT time <br /> (2009MAR08 is 
DST switch in US) </td></tr>
+</tbody>
+</table></div>
+<div class="section">
+<h5><a name="a4.4.2.2._The_coord:endOfMonthsint_n_EL_function"></a>4.4.2.2. 
The coord:endOfMonths(int n) EL function</h5>
+<p>The <tt>${coord:endOfMonths(int n)}</tt> EL function is identical to the 
<tt>${coord:months(int n)}</tt> except that it shifts the first occurrence to 
the end of the month for the specified timezone before computing the interval 
in minutes.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Starting Nominal UTC time</b> </th>
+<th> <b>Timezone</b> </th>
+<th> <b>Usage</b>  </th>
+<th> <b>Value</b> </th>
+<th> <b>First Occurrence</b> </th>
+<th> <b>Comments</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>2009-01-01T00:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 40320 </td>
+<td> <tt>2009-02-01T00:00Z</tt> </td>
+<td> first occurrence in 2009FEB 00:00 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 40320 </td>
+<td> <tt>2009-02-01T00:00Z</tt> </td>
+<td> first occurrence in 2009FEB 00:00 UTC time </td></tr>
+<tr class="b">
+<td> <tt>2009-01-31T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 40320 </td>
+<td> <tt>2009-02-01T00:00Z</tt> </td>
+<td> first occurrence in 2009FEB 00:00 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-01-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 40320 </td>
+<td> <tt>2009-02-01T08:00Z</tt> </td>
+<td> first occurrence in 2009FEB 08:00 UTC time </td></tr>
+<tr class="b">
+<td> <tt>2009-02-02T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 44580  </td>
+<td> <tt>2009-03-01T08:00Z</tt> </td>
+<td> first occurrence in 2009MAR 08:00 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2009-02-01T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfMonths(1)}</tt> </td>
+<td> 44580  </td>
+<td> <tt>2009-03-01T08:00Z</tt> </td>
+<td> first occurrence in 2009MAR 08:00 UTC time </td></tr>
+</tbody>
+</table>
+
+<div>
+<div>
+<pre class="source">&lt;coordinator-app name=&quot;hello-coord&quot; 
frequency=&quot;${coord:months(1)}&quot;
+                  start=&quot;2009-01-02T08:00Z&quot; 
end=&quot;2009-04-02T08:00Z&quot; timezone=&quot;America/Los_Angeles&quot;
+                 xmlns=&quot;uri:oozie:coordinator:0.5&quot;&gt;
+      &lt;controls&gt;
+        &lt;timeout&gt;10&lt;/timeout&gt;
+        &lt;concurrency&gt;${concurrency_level}&lt;/concurrency&gt;
+        &lt;execution&gt;${execution_order}&lt;/execution&gt;
+        &lt;throttle&gt;${materialization_throttle}&lt;/throttle&gt;
+      &lt;/controls&gt;
+
+      &lt;datasets&gt;
+       &lt;dataset name=&quot;din&quot; 
frequency=&quot;${coord:endOfMonths(1)}&quot;
+                initial-instance=&quot;2009-01-02T08:00Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+       &lt;dataset name=&quot;dout&quot; 
frequency=&quot;${coord:minutes(30)}&quot;
+                initial-instance=&quot;2009-01-02T08:00Z&quot; 
timezone=&quot;UTC&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+      &lt;/datasets&gt;
+
+      &lt;input-events&gt;
+         &lt;data-in name=&quot;input&quot; dataset=&quot;din&quot;&gt;
+                               
&lt;instance&gt;${coord:current(0)}&lt;/instance&gt;
+         &lt;/data-in&gt;
+      &lt;/input-events&gt;
+
+      &lt;output-events&gt;
+         &lt;data-out name=&quot;output&quot; dataset=&quot;dout&quot;&gt;
+                               
&lt;instance&gt;${coord:current(1)}&lt;/instance&gt;
+         &lt;/data-out&gt;
+      &lt;/output-events&gt;
+
+      &lt;action&gt;
+        &lt;workflow&gt;
+          &lt;app-path&gt;${wf_app_path}&lt;/app-path&gt;
+          &lt;configuration&gt;
+              &lt;property&gt;
+              &lt;name&gt;wfInput&lt;/name&gt;
+              &lt;value&gt;${coord:dataIn('input')}&lt;/value&gt;
+            &lt;/property&gt;
+            &lt;property&gt;
+              &lt;name&gt;wfOutput&lt;/name&gt;
+              &lt;value&gt;${coord:dataOut('output')}&lt;/value&gt;
+            &lt;/property&gt;
+         &lt;/configuration&gt;
+       &lt;/workflow&gt;
+      &lt;/action&gt;
+ &lt;/coordinator-app&gt;
+</pre></div></div>
+</div></div>
+<div class="section">
+<h4><a name="a4.4.3._The_coord:endOfWeeksint_n_EL_function"></a>4.4.3. The 
coord:endOfWeeks(int n) EL function</h4>
+<p>The <tt>${coord:endOfWeeks(int n)}</tt>  EL function shifts the first 
occurrence to the start of the week for the specified timezone before computing 
the interval in minutes. The start of the week depends on the Java&#x2019;s 
implementation of <a class="externalLink" 
href="https://docs.oracle.com/javase/8/docs/api/java/util/Calendar.html#getFirstDayOfWeek--";>Calendar.getFirstDayOfWeek()</a>
 i.e. first day of the week is SUNDAY in the U.S., MONDAY in France.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Starting Nominal UTC time</b> </th>
+<th> <b>Timezone</b> </th>
+<th> <b>Usage</b>  </th>
+<th> <b>Value</b> </th>
+<th> <b>First Occurrence</b> </th>
+<th> <b>Comments</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>2017-01-04T00:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfWeeks(1)}</tt> </td>
+<td> 10080 </td>
+<td> <tt>2017-01-08T00:00Z</tt> </td>
+<td> first occurrence on 2017JAN08 08:00 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2017-01-04T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfWeeks(1)}</tt> </td>
+<td> 10080 </td>
+<td> <tt>2017-01-08T08:00Z</tt> </td>
+<td> first occurrence on 2017JAN08 08:00 UTC time </td></tr>
+<tr class="b">
+<td> <tt>2017-01-06T08:00Z</tt> </td>
+<td> <tt>UTC</tt> </td>
+<td> <tt>${coord:endOfWeeks(1)}</tt> </td>
+<td> 10080 </td>
+<td> <tt>2017-01-08T08:00Z</tt> </td>
+<td> first occurrence on 2017JAN08 08:00 UTC time </td></tr>
+<tr class="a">
+<td> <tt>2017-01-04T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfWeeks(1)}</tt> </td>
+<td> 10080 </td>
+<td> <tt>2017-01-08T08:00Z</tt> </td>
+<td> first occurrence in 2017JAN08 08:00 UTC time </td></tr>
+<tr class="b">
+<td> <tt>2017-01-06T08:00Z</tt> </td>
+<td> <tt>America/Los_Angeles</tt> </td>
+<td> <tt>${coord:endOfWeeks(1)}</tt> </td>
+<td> 10080 </td>
+<td> <tt>2017-01-08T08:00Z</tt> </td>
+<td> first occurrence in 2017JAN08 08:00 UTC time </td></tr>
+</tbody>
+</table>
+
+<div>
+<div>
+<pre class="source">&lt;coordinator-app name=&quot;hello-coord&quot; 
frequency=&quot;${coord:endOfWeeks(1)}&quot;
+                  start=&quot;2017-01-04T08:00Z&quot; 
end=&quot;2017-12-31T08:00Z&quot; timezone=&quot;America/Los_Angeles&quot;
+                 xmlns=&quot;uri:oozie:coordinator:0.5&quot;&gt;
+      &lt;controls&gt;
+        &lt;timeout&gt;10&lt;/timeout&gt;
+        &lt;concurrency&gt;${concurrency_level}&lt;/concurrency&gt;
+        &lt;execution&gt;${execution_order}&lt;/execution&gt;
+        &lt;throttle&gt;${materialization_throttle}&lt;/throttle&gt;
+      &lt;/controls&gt;
+
+      &lt;datasets&gt;
+       &lt;dataset name=&quot;din&quot; 
frequency=&quot;${coord:endOfWeeks(1)}&quot;
+                initial-instance=&quot;2017-01-01T08:00Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+       &lt;dataset name=&quot;dout&quot; 
frequency=&quot;${coord:endOfWeeks(1)}&quot;
+                initial-instance=&quot;2017-01-01T08:00Z&quot; 
timezone=&quot;UTC&quot;&gt;
+         
&lt;uri-template&gt;${baseFsURI}/${YEAR}/${MONTH}/${DAY}&lt;/uri-template&gt;
+        &lt;/dataset&gt;
+      &lt;/datasets&gt;
+
+      &lt;input-events&gt;
+         &lt;data-in name=&quot;input&quot; dataset=&quot;din&quot;&gt;
+            &lt;instance&gt;${coord:current(0)}&lt;/instance&gt;
+         &lt;/data-in&gt;
+      &lt;/input-events&gt;
+
+      &lt;output-events&gt;
+         &lt;data-out name=&quot;output&quot; dataset=&quot;dout&quot;&gt;
+            &lt;instance&gt;${coord:current(1)}&lt;/instance&gt;
+         &lt;/data-out&gt;
+      &lt;/output-events&gt;
+
+      &lt;action&gt;
+        &lt;workflow&gt;
+          &lt;app-path&gt;${wf_app_path}&lt;/app-path&gt;
+          &lt;configuration&gt;
+              &lt;property&gt;
+              &lt;name&gt;wfInput&lt;/name&gt;
+              &lt;value&gt;${coord:dataIn('input')}&lt;/value&gt;
+            &lt;/property&gt;
+            &lt;property&gt;
+              &lt;name&gt;wfOutput&lt;/name&gt;
+              &lt;value&gt;${coord:dataOut('output')}&lt;/value&gt;
+            &lt;/property&gt;
+         &lt;/configuration&gt;
+       &lt;/workflow&gt;
+      &lt;/action&gt;
+ &lt;/coordinator-app&gt;
+</pre></div></div>
+</div>
+<div class="section">
+<h4><a name="a4.4.4._Cron_syntax_in_coordinator_frequency"></a>4.4.4. Cron 
syntax in coordinator frequency</h4>
+<p>Oozie has historically allowed only very basic forms of scheduling: You 
could choose to run jobs separated by a certain number of minutes, hours, days 
or weeks. That&#x2019;s all. This works fine for processes that need to run 
continuously all year like building a search index to power an online 
website.</p>
+<p>However, there are a lot of cases that don&#x2019;t fit this model. For 
example, maybe you want to export data to a reporting system used during the 
day by business analysts. It would be wasteful to run the jobs when no analyst 
is going to take advantage of the new information, such as overnight. You might 
want a policy that says &#x201c;only run these jobs on weekdays between 6AM and 
8PM&#x201d;. Previous versions of Oozie didn&#x2019;t support this kind of 
complex scheduling policy without requiring multiple identical coordinators. 
Cron-scheduling improves the user experience in this area, allowing for a lot 
more flexibility.</p>
+<p>Cron is a standard time-based job scheduling mechanism in unix-like 
operating system. It is used extensively by system administrators to setup jobs 
and maintain software environment. Cron syntax generally consists of five 
fields, minutes, hours, date of month, month, and day of week respectively 
although multiple variations do exist.</p>
+
+<div>
+<div>
+<pre class="source">&lt;coordinator-app name=&quot;cron-coord&quot; 
frequency=&quot;0/10 1/2 ** ** *&quot; start=&quot;${start}&quot; 
end=&quot;${end}&quot; timezone=&quot;UTC&quot;
+                 xmlns=&quot;uri:oozie:coordinator:0.2&quot;&gt;
+        &lt;action&gt;
+        &lt;workflow&gt;
+            &lt;app-path&gt;${workflowAppUri}&lt;/app-path&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;jobTracker&lt;/name&gt;
+                    &lt;value&gt;${jobTracker}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;nameNode&lt;/name&gt;
+                    &lt;value&gt;${nameNode}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;queueName&lt;/name&gt;
+                    &lt;value&gt;${queueName}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/workflow&gt;
+    &lt;/action&gt;
+&lt;/coordinator-app&gt;
+</pre></div></div>
+
+<p>Cron expressions are comprised of 5 required fields. The fields 
respectively are described as follows:</p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Field name</b> </th>
+<th> <b>Allowed Values</b> </th>
+<th> <b>Allowed Special Characters</b>  </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>Minutes</tt> </td>
+<td> <tt>0-59</tt> </td>
+<td> , - * / </td></tr>
+<tr class="a">
+<td> <tt>Hours</tt> </td>
+<td> <tt>0-23</tt> </td>
+<td> , - * / </td></tr>
+<tr class="b">
+<td> <tt>Day-of-month</tt> </td>
+<td> <tt>1-31</tt> </td>
+<td> , - * ? / L W </td></tr>
+<tr class="a">
+<td> <tt>Month</tt> </td>
+<td> <tt>1-12 or JAN-DEC</tt> </td>
+<td> , - * / </td></tr>
+<tr class="b">
+<td> <tt>Day-of-Week</tt> </td>
+<td> <tt>1-7 or SUN-SAT</tt> </td>
+<td> , - * ? / L #</td></tr>
+</tbody>
+</table>
+<p>The &#x2018;**&#x2019; character is used to specify all values. For 
example, &#x201c;**&#x201d; in the minute field means &#x201c;every 
minute&#x201d;.</p>
+<p>The &#x2018;?&#x2019; character is allowed for the day-of-month and 
day-of-week fields. It is used to specify &#x2018;no specific value&#x2019;. 
This is useful when you need to specify something in one of the two fields, but 
not the other.</p>
+<p>The &#x2018;-&#x2019; character is used to specify ranges For example 
&#x201c;10-12&#x201d; in the hour field means &#x201c;the hours 10, 11 and 
12&#x201d;.</p>
+<p>The &#x2018;,&#x2019; character is used to specify additional values. For 
example &#x201c;MON,WED,FRI&#x201d; in the day-of-week field means &#x201c;the 
days Monday, Wednesday, and Friday&#x201d;.</p>
+<p>The &#x2018;/&#x2019; character is used to specify increments. For example 
&#x201c;0/15&#x201d; in the minutes field means &#x201c;the minutes 0, 15, 30, 
and 45&#x201d;. And &#x201c;5/15&#x201d; in the minutes field means &#x201c;the 
minutes 5, 20, 35, and 50&#x201d;. Specifying &#x2018;*&#x2019; before the 
&#x2018;/&#x2019; is equivalent to specifying 0 is the value to start with. 
Essentially, for each field in the expression, there is a set of numbers that 
can be turned on or off. For minutes, the numbers range from 0 to 59. For hours 
0 to 23, for days of the month 0 to 31, and for months 1 to 12. The 
&#x201c;/&#x201d; character simply helps you turn on every &#x201c;nth&#x201d; 
value in the given set. Thus &#x201c;7/6&#x201d; in the month field only turns 
on month &#x201c;7&#x201d;, it does NOT mean every 6th month, please note that 
subtlety.</p>
+<p>The &#x2018;L&#x2019; character is allowed for the day-of-month and 
day-of-week fields. This character is short-hand for &#x201c;last&#x201d;, but 
it has different meaning in each of the two fields. For example, the value 
&#x201c;L&#x201d; in the day-of-month field means &#x201c;the last day of the 
month&#x201d; - day 31 for January, day 28 for February on non-leap years. If 
used in the day-of-week field by itself, it simply means &#x201c;7&#x201d; or 
&#x201c;SAT&#x201d;. But if used in the day-of-week field after another value, 
it means &#x201c;the last xxx day of the month&#x201d; - for example 
&#x201c;6L&#x201d; means &#x201c;the last Friday of the month&#x201d;. You can 
also specify an offset from the last day of the month, such as 
&#x201c;L-3&#x201d; which would mean the third-to-last day of the calendar 
month. When using the &#x2018;L&#x2019; option, it is important not to specify 
lists, or ranges of values, as you&#x2019;ll get confusing/unexpected 
results.</p>
+<p>The &#x2018;W&#x2019; character is allowed for the day-of-month field. This 
character is used to specify the weekday (Monday-Friday) nearest the given day. 
As an example, if you were to specify &#x201c;15W&#x201d; as the value for the 
day-of-month field, the meaning is: &#x201c;the nearest weekday to the 15th of 
the month&#x201d;. So if the 15th is a Saturday, the trigger will fire on 
Friday the 14th. If the 15th is a Sunday, the trigger will fire on Monday the 
16th. If the 15th is a Tuesday, then it will fire on Tuesday the 15th. However 
if you specify &#x201c;1W&#x201d; as the value for day-of-month, and the 1st is 
a Saturday, the trigger will fire on Monday the 3rd, as it will not 
&#x2018;jump&#x2019; over the boundary of a month&#x2019;s days. The 
&#x2018;W&#x2019; character can only be specified when the day-of-month is a 
single day, not a range or list of days.</p>
+<p>The &#x2018;L&#x2019; and &#x2018;W&#x2019; characters can also be combined 
for the day-of-month expression to yield &#x2018;LW&#x2019;, which translates 
to &#x201c;last weekday of the month&#x201d;.</p>
+<p>The &#x2018;#&#x2019; character is allowed for the day-of-week field. This 
character is used to specify &#x201c;the nth&#x201d; XXX day of the month. For 
example, the value of &#x201c;6#3&#x201d; in the day-of-week field means the 
third Friday of the month (day 6 = Friday and &#x201c;#3&#x201d; = the 3rd one 
in the month). Other examples: &#x201c;2#1&#x201d; = the first Monday of the 
month and &#x201c;4#5&#x201d; = the fifth Wednesday of the month. Note that if 
you specify &#x201c;#5&#x201d; and there is not 5 of the given day-of-week in 
the month, then no firing will occur that month. If the &#x2018;#&#x2019; 
character is used, there can only be one expression in the day-of-week field 
(&#x201c;3#1,6#3&#x201d; is not valid, since there are two expressions).</p>
+<p>The legal characters and the names of months and days of the week are not 
case sensitive.</p>
+<p>If a user specifies an invalid cron syntax to run something on Feb, 30th 
for example: &#x201c;0 10 30 2 *&#x201d;, the coordinator job will not be 
created and an invalid coordinator frequency parse exception will be thrown.</p>
+<p>If a user has a coordinator job that materializes no action during run 
time, for example: frequency of &#x201c;0 10 ** ** *&#x201d; with start time of 
2013-10-18T21:00Z and end time of 2013-10-18T22:00Z, the coordinator job 
submission will be rejected and an invalid coordinator attribute exception will 
be thrown.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>Cron Expression</b> </th>
+<th> <b>Meaning</b> </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> 10 9 ** ** * </td>
+<td> Runs everyday at 9:10am </td></tr>
+<tr class="a">
+<td> 10,30,45 9 ** ** * </td>
+<td> Runs everyday at 9:10am, 9:30am, and 9:45am </td></tr>
+<tr class="b">
+<td> <tt>0 * 30 JAN 2-6</tt> </td>
+<td> Runs at 0 minute of every hour on weekdays and 30th of January </td></tr>
+<tr class="a">
+<td> <tt>0/20 9-17 ** ** 2-5</tt> </td>
+<td> Runs every Mon, Tue, Wed, and Thurs at minutes 0, 20, 40 from 9am to 5pm 
</td></tr>
+<tr class="b">
+<td> 1 2 L-3 ** ** </td>
+<td> Runs every third-to-last day of month at 2:01am </td></tr>
+<tr class="a">
+<td> <tt>1 2 6W 3 ?</tt> </td>
+<td> Runs on the nearest weekday to March, 6th every year at 2:01am </td></tr>
+<tr class="b">
+<td> <tt>1 2 * 3 3#2</tt> </td>
+<td> Runs every second Tuesday of March at 2:01am every year </td></tr>
+<tr class="a">
+<td> <tt>0 10,13 ** ** MON-FRI</tt> </td>
+<td> Runs every weekday at 10am and 1pm </td></tr>
+</tbody>
+</table>
+<p>NOTES:</p>
+
+<div>
+<div>
+<pre class="source">Cron expression and syntax in Oozie are inspired by 
Quartz:http://quartz-scheduler.org/api/2.0.0/org/quartz/CronExpression.html.
+However, there is a major difference between Quartz cron and Oozie cron in 
which Oozie cron doesn't have &quot;Seconds&quot; field
+since everything in Oozie functions at the minute granularity at most. 
Everything related to Oozie cron syntax should be based
+on the documentation in the Oozie documentation.
+
+Cron expression uses oozie server processing timezone. Since default oozie 
processing timezone is UTC, if you want to
+run a job on every weekday at 10am in Tokyo, Japan(UTC + 9), your cron 
expression should be &quot;0 1 * * 2-6&quot; instead of
+the &quot;0 10 * * 2-6&quot; which you might expect.
+
+Overflowing ranges is supported but strongly discouraged - that is, having a 
larger number on the left hand side than the right.
+You might do 22-2 to catch 10 o'clock at night until 2 o'clock in the morning, 
or you might have NOV-FEB.
+It is very important to note that overuse of overflowing ranges creates ranges 
that don't make sense and
+no effort has been made to determine which interpretation CronExpression 
chooses.
+An example would be &quot;0 14-6 ? * FRI-MON&quot;.
+</pre></div></div>
+</div></div></div>
+<div class="section">
+<h2><a name="a5._Dataset"></a>5. Dataset</h2>
+<p>A dataset is a collection of data referred to by a logical name.</p>
+<p>A dataset instance is a particular occurrence of a dataset and it is 
represented by a unique set of URIs. A dataset instance can be individually 
referred. Dataset instances for datasets containing ranges are identified by a 
set of unique URIs, otherwise a dataset instance is identified by a single 
unique URI.</p>
+<p>Datasets are typically defined in some central place for a business domain 
and can be accessed by the coordinator. Because of this, they can be defined 
once and used many times.</p>
+<p>A dataset is a synchronous (produced at regular time intervals, it has an 
expected frequency) input.</p>
+<p>A dataset instance is considered to be immutable while it is being consumed 
by coordinator jobs.</p>
+<div class="section">
+<h3><a name="a5.1._Synchronous_Datasets"></a>5.1. Synchronous Datasets</h3>
+<p>Instances of synchronous datasets are produced at regular time intervals, 
at an expected frequency. They are also referred to as &#x201c;clocked 
datasets&#x201d;.</p>
+<p>Synchronous dataset instances are identified by their nominal creation 
time. The nominal creation time is normally specified in the dataset instance 
URI.</p>
+<p>A synchronous dataset definition contains the following information:</p>
+<ul>
+
+<li><b><font color="#0000ff"> name: </font></b> The dataset name. It must be a 
valid Java identifier.</li>
+<li><b><font color="#0000ff"> frequency: </font></b>* It represents the rate, 
in minutes at which data is <i>periodically</i> created. The granularity is in 
minutes and can be expressed using EL expressions, for example: ${5 ** 
HOUR}.</li>
+<li><b><font color="#0000ff"> initial-instance: </font></b> The UTC datetime 
of the initial instance of the dataset. The initial-instance also provides the 
baseline datetime to compute instances of the dataset using multiples of the 
frequency.</li>
+<li><b><font color="#0000ff"> timezone:</font></b> The timezone of the 
dataset.</li>
+<li><b><font color="#0000ff"> uri-template:</font></b> The URI template that 
identifies the dataset and can be resolved into concrete URIs to identify a 
particular dataset instance. The URI template is constructed using:
+<ul>
+
+<li><b><font color="#0000ff"> constants </font></b> See the allowable EL Time 
Constants below. Ex: ${YEAR}/${MONTH}.</li>
+<li><b><font color="#0000ff"> variables </font></b> Variables must be resolved 
at the time a coordinator job is submitted to the coordinator engine. They are 
normally provided a job parameters (configuration properties). Ex: 
${market}/${language}</li>
+</ul>
+</li>
+<li><b><font color="#0000ff"> done-flag:</font></b> This flag denotes when a 
dataset instance is ready to be consumed.
+<ul>
+
+<li>If the done-flag is omitted the coordinator will wait for the presence of 
a _SUCCESS file in the directory (Note: MapReduce jobs create this on 
successful completion automatically).</li>
+<li>If the done-flag is present but empty, then the existence of the directory 
itself indicates that the dataset is ready.</li>
+<li>If the done-flag is present but non-empty, Oozie will check for the 
presence of the named file within the directory, and will be considered ready 
(done) when the file exists.</li>
+</ul>
+</li>
+</ul>
+<p>The following EL constants can be used within synchronous dataset URI 
templates:</p>
+<table border="0" class="table table-striped">
+<thead>
+
+<tr class="a">
+<th> <b>EL Constant</b> </th>
+<th> <b>Resulting Format</b> </th>
+<th> <b>Comments</b>  </th></tr>
+</thead><tbody>
+
+<tr class="b">
+<td> <tt>YEAR</tt> </td>
+<td> <i>YYYY</i> </td>
+<td> 4 digits representing the year </td></tr>
+<tr class="a">
+<td> <tt>MONTH</tt> </td>
+<td> <i>MM</i> </td>
+<td> 2 digits representing the month of the year, January = 1 </td></tr>
+<tr class="b">
+<td> <tt>DAY</tt> </td>
+<td> <i>DD</i> </td>
+<td> 2 digits representing the day of the month </td></tr>
+<tr class="a">
+<td> <tt>HOUR</tt> </td>
+<td> <i>HH</i> </td>
+<td> 2 digits representing the hour of the day, in 24 hour format, 0 - 23 
</td></tr>
+<tr class="b">
+<td> <tt>MINUTE</tt> </td>
+<td> <i>mm</i> </td>
+<td> 2 digits representing the minute of the hour, 0 - 59 </td></tr>
+</tbody>
+</table>
+<p><b><font color="#800080">Syntax: </font></b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;[NAME]&quot; 
frequency=&quot;[FREQUENCY]&quot;
+           initial-instance=&quot;[DATETIME]&quot; 
timezone=&quot;[TIMEZONE]&quot;&gt;
+    &lt;uri-template&gt;[URI TEMPLATE]&lt;/uri-template&gt;
+    &lt;done-flag&gt;[FILE NAME]&lt;/done-flag&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>IMPORTANT: The values of the EL constants in the dataset URIs (in HDFS) are 
expected in UTC. Oozie Coordinator takes care of the timezone conversion when 
performing calculations.</p>
+<p><b><font color="#008000"> Examples: </font></b></p>
+<ol style="list-style-type: decimal">
+
+<li>
+
+<p><b>A dataset produced once every day at 00:15 PST8PDT and done-flag is set 
to empty:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;logs&quot; 
frequency=&quot;${coord:days(1)}&quot;
+           initial-instance=&quot;2009-02-15T08:15Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+      hdfs://foo:8020/app/logs/${market}/${YEAR}${MONTH}/${DAY}/data
+    &lt;/uri-template&gt;
+    &lt;done-flag&gt;&lt;/done-flag&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>The dataset would resolve to the following URIs and Coordinator looks for 
the existence of the directory itself:</p>
+
+<div>
+<div>
+<pre class="source">  [market] will be replaced with user given property.
+
+  hdfs://foo:8020/usr/app/[market]/2009/02/15/data
+  hdfs://foo:8020/usr/app/[market]/2009/02/16/data
+  hdfs://foo:8020/usr/app/[market]/2009/02/17/data
+  ...
+</pre></div></div>
+</li>
+<li>
+
+<p><b>A dataset available on the 10th of each month and done-flag is default 
&#x2018;_SUCCESS&#x2019;:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;stats&quot; 
frequency=&quot;${coord:months(1)}&quot;
+           initial-instance=&quot;2009-01-10T10:00Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    
&lt;uri-template&gt;hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data&lt;/uri-template&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>The dataset would resolve to the following URIs:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/stats/2009/01/data
+  hdfs://foo:8020/usr/app/stats/2009/02/data
+  hdfs://foo:8020/usr/app/stats/2009/03/data
+  ...
+</pre></div></div>
+
+<p>The dataset instances are not ready until &#x2018;_SUCCESS&#x2019; exists 
in each path:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/stats/2009/01/data/_SUCCESS
+  hdfs://foo:8020/usr/app/stats/2009/02/data/_SUCCESS
+  hdfs://foo:8020/usr/app/stats/2009/03/data/_SUCCESS
+  ...
+</pre></div></div>
+</li>
+<li>
+
+<p><b>A dataset available at the end of every quarter and done-flag is 
&#x2018;trigger.dat&#x2019;:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;stats&quot; 
frequency=&quot;${coord:months(3)}&quot;
+           initial-instance=&quot;2009-01-31T20:00Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+      hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data
+    &lt;/uri-template&gt;
+    &lt;done-flag&gt;trigger.dat&lt;/done-flag&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>The dataset would resolve to the following URIs:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/stats/2009/01/data
+  hdfs://foo:8020/usr/app/stats/2009/04/data
+  hdfs://foo:8020/usr/app/stats/2009/07/data
+  ...
+</pre></div></div>
+
+<p>The dataset instances are not ready until &#x2018;trigger.dat&#x2019; 
exists in each path:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/stats/2009/01/data/trigger.dat
+  hdfs://foo:8020/usr/app/stats/2009/04/data/trigger.dat
+  hdfs://foo:8020/usr/app/stats/2009/07/data/trigger.dat
+  ...
+</pre></div></div>
+</li>
+<li>
+
+<p><b>Normally the URI template of a dataset has a precision similar to the 
frequency:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;logs&quot; 
frequency=&quot;${coord:days(1)}&quot;
+           initial-instance=&quot;2009-01-01T10:30Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+      hdfs://foo:8020/usr/app/logs/${YEAR}/${MONTH}/${DAY}/data
+    &lt;/uri-template&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>The dataset would resolve to the following URIs:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/logs/2009/01/01/data
+  hdfs://foo:8020/usr/app/logs/2009/01/02/data
+  hdfs://foo:8020/usr/app/logs/2009/01/03/data
+  ...
+</pre></div></div>
+</li>
+<li>
+
+<p><b>However, if the URI template has a finer precision than the dataset 
frequency:</b></p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;logs&quot; 
frequency=&quot;${coord:days(1)}&quot;
+           initial-instance=&quot;2009-01-01T10:30Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+      
hdfs://foo:8020/usr/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}/data
+    &lt;/uri-template&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+
+<p>The dataset resolves to the following URIs with fixed values for the finer 
precision template variables:</p>
+
+<div>
+<div>
+<pre class="source">  hdfs://foo:8020/usr/app/logs/2009/01/01/10/30/data
+  hdfs://foo:8020/usr/app/logs/2009/01/02/10/30/data
+  hdfs://foo:8020/usr/app/logs/2009/01/03/10/30/data
+  ...
+</pre></div></div>
+</li>
+</ol></div>
+<div class="section">
+<h3><a name="a5.2._Dataset_URI-Template_types"></a>5.2. Dataset URI-Template 
types</h3>
+<p>Each dataset URI could be a HDFS path URI denoting a HDFS directory: 
<tt>hdfs://foo:8020/usr/logs/20090415</tt> or a HCatalog partition URI 
identifying a set of table partitions: 
<tt>hcat://bar:8020/logsDB/logsTable/dt=20090415;region=US</tt>.</p>
+<p>HCatalog enables table and storage management for Pig, Hive and MapReduce. 
The format to specify a HCatalog table partition URI is <tt>hcat://[metastore 
server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value];...</tt></p>
+<p>For example,</p>
+
+<div>
+<div>
+<pre class="source">  &lt;dataset name=&quot;logs&quot; 
frequency=&quot;${coord:days(1)}&quot;
+           initial-instance=&quot;2009-02-15T08:15Z&quot; 
timezone=&quot;America/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+      
hcat://myhcatmetastore:9080/database1/table1/myfirstpartitionkey=myfirstvalue;mysecondpartitionkey=mysecondvalue
+    &lt;/uri-template&gt;
+    &lt;done-flag&gt;&lt;/done-flag&gt;
+  &lt;/dataset&gt;
+</pre></div></div>
+</div>
+<div class="section">
+<h3><a name="a5.3._Asynchronous_Datasets"></a>5.3. Asynchronous Datasets</h3>
+<ul>
+
+<li>TBD</li>
+</ul></div>
+<div class="section">
+<h3><a name="a5.4._Dataset_Definitions"></a>5.4. Dataset Definitions</h3>
+<p>Dataset definitions are grouped in XML files. <b>IMPORTANT:</b> Please note 
that if an XML namespace version is specified for the coordinator-app element 
in the coordinator.xml file, no namespace needs to be defined separately for 
the datasets element (even if the dataset is defined in a separate file). 
Specifying it at multiple places might result in xml errors while submitting 
the coordinator job.</p>
+<p><b><font color="#800080">Syntax: </font></b></p>
+
+<div>
+<div>
+<pre class="source"> &lt;!-- Synchronous datasets --&gt;
+&lt;datasets&gt;
+  &lt;include&gt;[SHARED_DATASETS]&lt;/include&gt;
+  ...
+  &lt;dataset name=&quot;[NAME]&quot; frequency=&quot;[FREQUENCY]&quot;
+           initial-instance=&quot;[DATETIME]&quot; 
timezone=&quot;[TIMEZONE]&quot;&gt;
+    &lt;uri-template&gt;[URI TEMPLATE]&lt;/uri-template&gt;
+  &lt;/dataset&gt;
+  ...
+&lt;/datasets&gt;
+</pre></div></div>
+
+<p><b><font color="#008000"> Example: </font></b></p>
+
+<div>
+<div>
+<pre class="source">&lt;datasets&gt;
+.
+  
&lt;include&gt;hdfs://foo:8020/app/dataset-definitions/globallogs.xml&lt;/include&gt;
+.
+  &lt;dataset name=&quot;logs&quot; frequency=&quot;${coord:hours(12)}&quot;
+           initial-instance=&quot;2009-02-15T08:15Z&quot; 
timezone=&quot;Americas/Los_Angeles&quot;&gt;
+    &lt;uri-template&gt;
+    
hdfs://foo:8020/app/logs/${market}/${YEAR}${MONTH}/${DAY}/${HOUR}/${MINUTE}/data
+    &lt;/uri-template&gt;
+  &lt;/dataset&gt;
+.
+  &lt;dataset name=&quot;stats&quot; frequency=&quot;${coord:months(1)}&quot;
+           initial-instance=&quot;2009-01-10T10:00Z&quot; 
timezone=&quot;Americas/Los_Angeles&quot;&gt;
+    
&lt;uri-template&gt;hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data&lt;/uri-template&gt;
+  &lt;/dataset&gt;
+.
+&lt;/datasets&gt;
+</pre></div></div>
+</div></div>
+<div class="section">
+<h2><a name="a6._Coordinator_Application"></a>6. Coordinator Application</h2>
+<div class="section">
+<h3><a name="a6.1._Concepts"></a>6.1. Concepts</h3>
+<div class="section">
+<h4><a name="a6.1.1._Coordinator_Application"></a>6.1.1. Coordinator 
Application</h4>
+<p>A coordinator application is a program that triggers actions (commonly 
workflow jobs) when a set of conditions are met. Conditions can be a time 
frequency, the availability of new dataset instances or other external 
events.</p>
+<p>Types of coordinator applications:</p>
+<ul>
+
+<li><b>Synchronous:</b> Its coordinator actions are created at specified time 
intervals.</li>
+</ul>
+<p>Coordinator applications are normally parameterized.</p></div>
+<div class="section">
+<h4><a name="a6.1.2._Coordinator_Job"></a>6.1.2. Coordinator Job</h4>
+<p>To create a coordinator job, a job configuration that resolves all 
coordinator application parameters must be provided to the coordinator 
engine.</p>
+<p>A coordinator job is a running instance of a coordinator application 
running from a start time to an end time. The start time must be earlier than 
the end time.</p>
+<p>At any time, a coordinator job is in one of the following status: <b>PREP, 
RUNNING, RUNNINGWITHERROR, PREPSUSPENDED, SUSPENDED, SUSPENDEDWITHERROR, 
PREPPAUSED, PAUSED, PAUSEDWITHERROR, SUCCEEDED, DONEWITHERROR, KILLED, 
FAILED</b>.</p>
+<p>Valid coordinator job status transitions are:</p>
+<ul>
+
+<li><b>PREP &#x2013;&gt; PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED</b></li>
+<li><b>RUNNING &#x2013;&gt; RUNNINGWITHERROR | SUSPENDED | PAUSED | SUCCEEDED 
| KILLED</b></li>
+<li><b>RUNNINGWITHERROR &#x2013;&gt; RUNNING | SUSPENDEDWITHERROR | 
PAUSEDWITHERROR | DONEWITHERROR | KILLED | FAILED</b></li>
+<li><b>PREPSUSPENDED &#x2013;&gt; PREP | KILLED</b></li>
+<li><b>SUSPENDED &#x2013;&gt; RUNNING | KILLED</b></li>
+<li><b>SUSPENDEDWITHERROR &#x2013;&gt; RUNNINGWITHERROR | KILLED</b></li>
+<li><b>PREPPAUSED &#x2013;&gt; PREP | KILLED</b></li>
+<li><b>PAUSED &#x2013;&gt; SUSPENDED | RUNNING | KILLED</b></li>
+<li><b>PAUSEDWITHERROR &#x2013;&gt; SUSPENDEDWITHERROR | RUNNINGWITHERROR | 
KILLED</b></li>
+<li><b>FAILED | KILLED &#x2013;&gt; IGNORED</b></li>
+<li><b>IGNORED &#x2013;&gt; RUNNING</b></li>
+</ul>
+<p>When a coordinator job is submitted, oozie parses the coordinator job XML. 
Oozie then creates a record for the coordinator with status <b>PREP</b> and 
returns a unique ID. The coordinator is also started immediately if pause time 
is not set.</p>
+<p>When a user requests to suspend a coordinator job that is in <b>PREP</b> 
state, oozie puts the job in status <b>PREPSUSPENDED</b>. Similarly, when pause 
time reaches for a coordinator job with <b>PREP</b> status, oozie puts the job 
in status <b>PREPPAUSED</b>.</p>
+<p>Conversely, when a user requests to resume a <b>PREPSUSPENDED</b> 
coordinator job, oozie puts the job in status <b>PREP</b>. And when pause time 
is reset for a coordinator job and job status is <b>PREPPAUSED</b>, oozie puts 
the job in status <b>PREP</b>.</p>
+<p>When a coordinator job starts, oozie puts the job in status <b>RUNNING</b> 
and start materializing workflow jobs based on job frequency. If any workflow 
job goes to <b>FAILED/KILLED/TIMEDOUT</b> state, the coordinator job is put in 
<b>RUNNINGWITHERROR</b></p>
+<p>When a user requests to kill a coordinator job, oozie puts the job in 
status <b>KILLED</b> and it sends kill to all submitted workflow jobs.</p>
+<p>When a user requests to suspend a coordinator job that is in <b>RUNNING</b> 
status, oozie puts the job in status <b>SUSPENDED</b> and it suspends all 
submitted workflow jobs. Similarly, when a user requests to suspend a 
coordinator job that is in <b>RUNNINGWITHERROR</b> status, oozie puts the job 
in status <b>SUSPENDEDWITHERROR</b> and it suspends all submitted workflow 
jobs.</p>
+<p>When pause time reaches for a coordinator job that is in <b>RUNNING</b> 
status, oozie puts the job in status <b>PAUSED</b>. Similarly, when pause time 
reaches for a coordinator job that is in <b>RUNNINGWITHERROR</b> status, oozie 
puts the job in status <b>PAUSEDWITHERROR</b>.</p>
+<p>Conversely, when a user requests to resume a <b>SUSPENDED</b> coordinator 
job, oozie puts the job in status <b>RUNNING</b>. Also,  when a user requests 
to resume a <b>SUSPENDEDWITHERROR</b> coordinator job, oozie puts the job in 
status <b>RUNNINGWITHERROR</b>. And when pause time is reset for a coordinator 
job and job status is <b>PAUSED</b>, oozie puts the job in status 
<b>RUNNING</b>. Also, when the pause time is reset for a coordinator job and 
job status is <b>PAUSEDWITHERROR</b>, oozie puts the job in status 
<b>RUNNINGWITHERROR</b></p>
+<p>A coordinator job creates workflow jobs (commonly coordinator actions) only 
for the duration of the coordinator job and only if the coordinator job is in 
<b>RUNNING</b> status. If the coordinator job has been suspended, when resumed 
it will create all the coordinator actions that should have been created during 
the time it was suspended, actions will not be lost, they will delayed.</p>
+<p>When the coordinator job materialization finishes and all workflow jobs 
finish, oozie updates the coordinator status accordingly. For example, if all 
workflows are <b>SUCCEEDED</b>, oozie puts the coordinator job into 
<b>SUCCEEDED</b> status. If all workflows are <b>FAILED</b>, oozie puts the 
coordinator job into <b>FAILED</b> status. If all workflows are <b>KILLED</b>, 
the coordinator job status changes to KILLED. However, if any workflow job 
finishes with not <b>SUCCEEDED</b> and combination of <b>KILLED</b>, 
<b>FAILED</b> or <b>TIMEOUT</b>, oozie puts the coordinator job into 
<b>DONEWITHERROR</b>. If all coordinator actions are <b>TIMEDOUT</b>, oozie 
puts the coordinator job into <b>DONEWITHERROR</b>.</p>
+<p>A coordinator job in <b>FAILED</b> or <b>KILLED</b> status can be changed 
to <b>IGNORED</b> status. A coordinator job in <b>IGNORED</b> status can be 
changed to <b>RUNNING</b> status.</p></div>
+<div class="section">
+<h4><a name="a6.1.3._Coordinator_Action"></a>6.1.3. Coordinator Action</h4>
+<p>A coordinator job creates and executes coordinator actions.</p>
+<p>A coordinator action is normally a workflow job that consumes and produces 
dataset instances.</p>
+<p>Once an coordinator action is created (this is also referred as the action 
being materialized), the coordinator action will be in waiting until all 
required inputs for execution are satisfied or until the waiting times out.</p>
+<div class="section">
+<h5><a 
name="a6.1.3.1._Coordinator_Action_Creation_Materialization"></a>6.1.3.1. 
Coordinator Action Creation (Materialization)</h5>
+<p>A coordinator job has one driver event that determines the creation 
(materialization) of its coordinator actions (typically a workflow job).</p>
+<ul>
+
+<li>For synchronous coordinator jobs the driver event is the frequency of the 
coordinator job.</li>
+</ul></div>
+<div class="section">
+<h5><a name="a6.1.3.2._Coordinator_Action_Status"></a>6.1.3.2. Coordinator 
Action Status</h5>
+<p>Once a coordinator action has been created (materialized) the coordinator 
action qualifies for execution. At this point, the action status is 
<b>WAITING</b>.</p>
+<p>A coordinator action in <b>WAITING</b> status must wait until all its input 
events are available before is ready for execution. When a coordinator action 
is ready for execution its status is <b>READY</b>.</p>
+<p>A coordinator action in <b>WAITING</b> status may timeout before it becomes 
ready for execution. Then the action status is <b>TIMEDOUT</b>.</p>
+<p>A coordinator action may remain in <b>READY</b> status for a while, without 
starting execution, due to the concurrency execution policies of the 
coordinator job.</p>
+<p>A coordinator action in <b>READY</b> or <b>WAITING</b> status changes to 
<b>SKIPPED</b> status if the execution strategy is LAST_ONLY and the current 
time is past the next action&#x2019;s nominal time.  See section 6.3 for more 
details.</p>
+<p>A coordinator action in <b>READY</b> or <b>WAITING</b> status changes to 
<b>SKIPPED</b> status if the execution strategy is NONE and the current time is 
past the action&#x2019;s nominal time + 1 minute.  See section 6.3 for more 
details.</p>
+<p>A coordinator action in <b>READY</b> status changes to <b>SUBMITTED</b> 
status if total current <b>RUNNING</b> and <b>SUBMITTED</b> actions are less 
than concurrency execution limit.</p>
+<p>A coordinator action in <b>SUBMITTED</b> status changes to <b>RUNNING</b> 
status when the workflow engine start execution of the coordinator action.</p>
+<p>A coordinator action is in <b>RUNNING</b> status until the associated 
workflow job completes its execution. Depending on the workflow job completion 
status, the coordinator action will be in <b>SUCCEEDED</b>, <b>KILLED</b> or 
<b>FAILED</b> status.</p>
+<p>A coordinator action in <b>WAITING</b>, <b>READY</b>, <b>SUBMITTED</b> or 
<b>RUNNING</b> status can be killed, changing to <b>KILLED</b> status.</p>
+<p>A coordinator action in <b>SUBMITTED</b> or <b>RUNNING</b> status can also 
fail, changing to <b>FAILED</b> status.</p>
+<p>A coordinator action in <b>FAILED</b>, <b>KILLED</b>, or <b>TIMEDOUT</b> 
status can be changed to <b>IGNORED</b> status. A coordinator action in 
<b>IGNORED</b> status can be rerun, changing to <b>WAITING</b> status.</p>
+<p>Valid coordinator action status transitions are:</p>
+<ul>
+
+<li><b>WAITING &#x2013;&gt; READY | TIMEDOUT | SKIPPED | KILLED</b></li>
+<li><b>READY &#x2013;&gt; SUBMITTED | SKIPPED | KILLED</b></li>
+<li><b>SUBMITTED &#x2013;&gt; RUNNING | KILLED | FAILED</b></li>
+<li><b>RUNNING &#x2013;&gt; SUCCEEDED | KILLED | FAILED</b></li>
+<li><b>FAILED | KILLED | TIMEDOUT &#x2013;&gt; IGNORED</b></li>
+<li><b>IGNORED &#x2013;&gt; WAITING</b></li>
+</ul></div></div>
+<div class="section">
+<h4><a name="a6.1.4._Input_Events"></a>6.1.4. Input Events</h4>
+<p>The Input events of a coordinator application specify the input conditions 
that are required in order to execute a coordinator action.</p>
+<p>In the current specification input events are restricted to dataset 
instances availability.</p>
+<p>All the datasets instances defined as input events must be available for 
the coordinator action to be ready for execution ( <b>READY</b> status).</p>
+<p>Input events are normally parameterized. For example, the last 24 hourly 
instances of the &#x2018;searchlogs&#x2019; dataset.</p>
+<p>Input events can be refer to multiple instances of multiple datasets. For 
example, the last 24 hourly instances of the &#x2018;searchlogs&#x2019; dataset 
and the last weekly instance of the &#x2018;celebrityRumours&#x2019; 
dataset.</p></div>
+<div class="section">
+<h4><a name="a6.1.5._Output_Events"></a>6.1.5. Output Events</h4>
+<p>A coordinator action can produce one or more dataset(s) instances as 
output.</p>
+<p>Dataset instances produced as output by one coordinator actions may be 
consumed as input by another coordinator action(s) of other coordinator 
job(s).</p>
+<p>The chaining of coordinator jobs via the datasets they produce and consume 
is referred as a <b>data pipeline.</b></p>
+<p>In the current specification coordinator job output events are restricted 
to dataset instances.</p></div>
+<div class="section">
+<h4><a name="a6.1.6._Coordinator_Action_Execution_Policies"></a>6.1.6. 
Coordinator Action Execution Policies</h4>
+<p>The execution policies for the actions of a coordinator job can be defined 
in the coordinator application.</p>
+<ul>
+
+<li>Timeout: A coordinator job can specify the timeout for its coordinator 
actions, this is, how long the coordinator action will be in <i>WAITING</i> or 
<i>READY</i> status before giving up on its execution.</li>
+<li>Concurrency: A coordinator job can specify the concurrency for its 
coordinator actions, this is, how many coordinator actions are allowed to run 
concurrently ( <b>RUNNING</b> status) before the coordinator engine starts 
throttling them.</li>

[... 3826 lines stripped ...]

Reply via email to