Added: websites/staging/oozie/trunk/content/docs/5.0.0/AG_Install.html ============================================================================== --- websites/staging/oozie/trunk/content/docs/5.0.0/AG_Install.html (added) +++ websites/staging/oozie/trunk/content/docs/5.0.0/AG_Install.html Mon Apr 9 14:26:49 2018 @@ -0,0 +1,1625 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at Apr 9, 2018 + | Rendered using Apache Maven Fluido Skin 1.4 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Oozie - </title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.4.min.js"></script> + + + </head> + <body class="topBarDisabled"> + + + + <div class="container-fluid"> + <div id="banner"> + <div class="pull-left"> + <a href="https://oozie.apache.org/" id="bannerLeft"> + <img src="https://oozie.apache.org/images/oozie_200x.png" alt="Oozie"/> + </a> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="../../" title="Apache"> + Apache</a> + <span class="divider">/</span> + </li> + <li class=""> + <a href="../../" title="Oozie"> + Oozie</a> + <span class="divider">/</span> + </li> + <li class=""> + <a href="../" title="docs"> + docs</a> + <span class="divider">/</span> + </li> + <li class=""> + <a href="./" title="5.0.0"> + 5.0.0</a> + <span class="divider">/</span> + </li> + <li class="active ">Oozie - </li> + + + + <li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2018-04-09</li> + <li id="projectVersion" class="pull-right"> + Version: 5.0.0 + </li> + + </ul> + </div> + + + <div class="row-fluid"> + <div id="leftColumn" class="span2"> + <div class="well sidebar-nav"> + + + <ul class="nav nav-list"> + </ul> + + + + <hr /> + + <div id="poweredBy"> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </div> + </div> + </div> + + + <div id="bodyColumn" class="span10" > + + <p></p> +<p><a href="./index.html">::Go back to Oozie Documentation Index::</a> +</p> +<a name="Oozie_Installation_and_Configuration"></a> +<div class="section"><h2> Oozie Installation and Configuration</h2> +<p><ul><ul><li><a href="#Basic_Setup">Basic Setup</a> +</li> +<li><a href="#Environment_Setup">Environment Setup</a> +</li> +<li><a href="#Oozie_Server_Setup">Oozie Server Setup</a> +<ul><li><a href="#Setting_Up_Oozie_with_an_Alternate_Tomcat">Setting Up Oozie with an Alternate Tomcat</a> +</li> +</ul> +</li> +<li><a href="#Database_Configuration">Database Configuration</a> +</li> +<li><a href="#Database_Migration">Database Migration</a> +</li> +<li><a href="#Oozie_Configuration">Oozie Configuration</a> +<ul><li><a href="#Oozie_Configuration_Properties">Oozie Configuration Properties</a> +</li> +<li><a href="#Precedence_of_Configuration_Properties">Precedence of Configuration Properties</a> +<ul><li><a href="#Overriding_Configuration_Values">Overriding Configuration Values</a> +</li> +<li><a href="#Prepending_Configuration_Values">Prepending Configuration Values</a> +</li> +</ul> +</li> +<li><a href="#Logging_Configuration">Logging Configuration</a> +</li> +<li><a href="#Oozie_User_Authentication_Configuration">Oozie User Authentication Configuration</a> +</li> +<li><a href="#Oozie_Hadoop_Authentication_Configuration">Oozie Hadoop Authentication Configuration</a> +</li> +<li><a href="#User_ProxyUser_Configuration">User ProxyUser Configuration</a> +</li> +<li><a href="#User_Authorization_Configuration">User Authorization Configuration</a> +</li> +<li><a href="#Oozie_System_ID_Configuration">Oozie System ID Configuration</a> +</li> +<li><a href="#Filesystem_Configuration">Filesystem Configuration</a> +</li> +<li><a href="#HCatalog_Configuration">HCatalog Configuration</a> +</li> +<li><a href="#Notifications_Configuration">Notifications Configuration</a> +</li> +<li><a href="#Setting_Up_Oozie_with_HTTPS_SSL">Setting Up Oozie with HTTPS (SSL)</a> +<ul><li><a href="#To_use_a_Self-Signed_Certificate">To use a Self-Signed Certificate</a> +</li> +<li><a href="#To_use_a_Certificate_from_a_Certificate_Authority">To use a Certificate from a Certificate Authority</a> +</li> +<li><a href="#Configure_the_Oozie_Server_to_use_SSL_HTTPS">Configure the Oozie Server to use SSL (HTTPS)</a> +</li> +<li><a href="#Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS">Configure the Oozie Client to connect using SSL (HTTPS)</a> +</li> +<li><a href="#Connect_to_the_Oozie_Web_UI_using_SSL_HTTPS">Connect to the Oozie Web UI using SSL (HTTPS)</a> +</li> +<li><a href="#Additional_considerations_for_Oozie_HA_with_SSL">Additional considerations for Oozie HA with SSL</a> +</li> +</ul> +</li> +<li><a href="#Fine_Tuning_an_Oozie_Server">Fine Tuning an Oozie Server</a> +</li> +<li><a href="#Using_Instrumentation_instead_of_Metrics">Using Instrumentation instead of Metrics</a> +</li> +<li><a href="#High_Availability_HA">High Availability (HA)</a> +<ul><li><a href="#Pre-requisites">Pre-requisites</a> +</li> +<li><a href="#InstallationConfiguration_Steps">Installation/Configuration Steps</a> +</li> +<li><a href="#Security">Security</a> +</li> +<li><a href="#JobId_sequence">JobId sequence</a> +</li> +</ul> +</li> +</ul> +</li> +<li><a href="#Starting_and_Stopping_Oozie">Starting and Stopping Oozie</a> +</li> +<li><a href="#Oozie_Command_Line_Installation">Oozie Command Line Installation</a> +</li> +<li><a href="#Oozie_Share_Lib">Oozie Share Lib</a> +</li> +<li><a href="#Oozie_CoordinatorsBundles_Processing_Timezone">Oozie Coordinators/Bundles Processing Timezone</a> +</li> +<li><a href="#MapReduce_Workflow_Uber_Jars">MapReduce Workflow Uber Jars</a> +</li> +<li><a href="#AdvancedCustom_Environment_Settings">Advanced/Custom Environment Settings</a> +</li> +</ul> +</ul> +</p> +<a name="Basic_Setup"></a> +<div class="section"><h3>Basic Setup</h3> +<p>Follow the instructions at <a href="./DG_QuickStart.html">Oozie Quick Start</a> +.</p> +<a name="Environment_Setup"></a> +</div> +<div class="section"><h3>Environment Setup</h3> +<p><b>IMPORTANT:</b> + Oozie ignores any set value for <tt>OOZIE_HOME</tt> +, Oozie computes its home automatically.</p> +<p>When running Oozie with its embedded Jetty server, the <tt>conf/oozie-env.sh</tt> + file can be +used to configure the following environment variables used by Oozie:</p> +<p><b>JETTY_OPTS</b> + : settings for the Embedded Jetty that runs Oozie. Java System properties +for Oozie should be specified in this variable. No default value.</p> +<p><b>OOZIE_CONFIG_FILE</b> + : Oozie configuration file to load from Oozie configuration directory. +Default value <tt>oozie-site.xml</tt> +.</p> +<p><b>OOZIE_LOGS</b> + : Oozie logs directory. Default value <tt>logs/</tt> + directory in the Oozie installation +directory.</p> +<p><b>OOZIE_LOG4J_FILE</b> + : Oozie Log4J configuration file to load from Oozie configuration directory. +Default value <tt>oozie-log4j.properties</tt> +.</p> +<p><b>OOZIE_LOG4J_RELOAD</b> + : Reload interval of the Log4J configuration file, in seconds. +Default value <tt>10</tt> +</p> +<p><b>OOZIE_CHECK_OWNER</b> + : If set to <tt>true</tt> +, Oozie setup/start/run/stop scripts will check that the +owner of the Oozie installation directory matches the user invoking the script. The default +value is undefined and interpreted as a <tt>false</tt> +.</p> +<p><b>OOZIE_INSTANCE_ID</b> + : The instance id of the Oozie server. When using HA, each server instance should have a unique instance id. +Default value <tt>${OOZIE_HTTP_HOSTNAME}</tt> +</p> +<a name="Oozie_Server_Setup"></a> +</div> +<div class="section"><h3>Oozie Server Setup</h3> +<p>The <tt>oozie-setup.sh</tt> + script prepares the embedded Jetty server to run Oozie.</p> +<p>The <tt>oozie-setup.sh</tt> + script options are:</p> +<p><pre> +Usage : oozie-setup.sh <Command and OPTIONS> + sharelib create -fs FS_URI [-locallib SHARED_LIBRARY] [-concurrency CONCURRENCY] + (create sharelib for oozie, + FS_URI is the fs.default.name + for hdfs uri; SHARED_LIBRARY, path to the + Oozie sharelib to install, it can be a tarball + or an expanded version of it. If omitted, + the Oozie sharelib tarball from the Oozie + installation directory will be used. + CONCURRENCY is a number of threads to be used + for copy operations. + By default 1 thread will be used) + (action fails if sharelib is already installed + in HDFS) + sharelib upgrade -fs FS_URI [-locallib SHARED_LIBRARY] ([deprecated][use create command to create new version] + upgrade existing sharelib, fails if there + is no existing sharelib installed in HDFS) + db create|upgrade|postupgrade -run [-sqlfile <FILE>] (create, upgrade or postupgrade oozie db with an + optional sql File) + export <file> exports the oozie database to the specified + file in zip format + import <file> imports the oozie database from the zip file + created by export + (without options prints this usage information) +</pre></p> +<p>If a directory <tt>libext/</tt> + is present in Oozie installation directory, the <tt>oozie-setup.sh</tt> + script will +include all JARs in Jetty's <tt>webapp/WEB_INF/lib/</tt> + directory.</p> +<p>If the ExtJS ZIP file is present in the <tt>libext/</tt> + directory, it will be added to the Jetty's <tt>webapp/</tt> + directory as well. +The ExtJS library file name be <tt>ext-2.2.zip</tt> +.</p> +<a name="Setting_Up_Oozie_with_an_Alternate_Tomcat"></a> +<div class="section"><h4>Setting Up Oozie with an Alternate Tomcat</h4> +<p>Use the <tt>addtowar.sh</tt> + script to prepare the Oozie server only if Oozie will run with a different +servlet container than the embedded Jetty provided with the distribution.</p> +<p>The <tt>addtowar.sh</tt> + script adds Hadoop JARs, JDBC JARs and the ExtJS library to the Oozie WAR file.</p> +<p>The <tt>addtowar.sh</tt> + script options are:</p> +<p><pre> + Usage : addtowar <OPTIONS> + Options: -inputwar INPUT_OOZIE_WAR + -outputwar OUTPUT_OOZIE_WAR + [-hadoop HADOOP_VERSION HADOOP_PATH] + [-extjs EXTJS_PATH] + [-jars JARS_PATH] (multiple JAR path separated by ':') + [-secureWeb WEB_XML_PATH] (path to secure web.xml) +</pre></p> +<p>The original <tt>oozie.war</tt> + file is in the Oozie server installation directory.</p> +<p>After the Hadoop JARs and the ExtJS library has been added to the <tt>oozie.war</tt> + file Oozie is ready to run.</p> +<p>Delete any previous deployment of the <tt>oozie.war</tt> + from the servlet container (if using Tomcat, delete +=oozie.war= and <tt>oozie</tt> + directory from Tomcat's <tt>webapps/</tt> + directory)</p> +<p>Deploy the prepared <tt>oozie.war</tt> + file (the one that contains the Hadoop JARs and the ExtJS library) in the +servlet container (if using Tomcat, copy the prepared <tt>oozie.war</tt> + file to Tomcat's <tt>webapps/</tt> + directory).</p> +<p><b>IMPORTANT:</b> + Only one Oozie instance can be deployed per Tomcat instance.</p> +<a name="Database_Configuration"></a> +</div> +</div> +<div class="section"><h3>Database Configuration</h3> +<p>Oozie works with HSQL, Derby, MySQL, Oracle, PostgreSQL or SQL Server databases.</p> +<p>By default, Oozie is configured to use Embedded Derby.</p> +<p>Oozie bundles the JDBC drivers for HSQL, Embedded Derby and PostgreSQL.</p> +<p>HSQL is normally used for test cases as it is an in-memory database and all data is lost every time Oozie is stopped.</p> +<p>If using Derby, MySQL, Oracle, PostgreSQL, or SQL Server, the Oozie database schema must be created using the <tt>ooziedb.sh</tt> + command +line tool.</p> +<p>If using MySQL, Oracle, or SQL Server, the corresponding JDBC driver JAR file must be copied to Oozie's <tt>libext/</tt> + directory and +it must be added to Oozie WAR file using the <tt>bin/addtowar.sh</tt> + or the <tt>oozie-setup.sh</tt> + scripts using the <tt>-jars</tt> + option.</p> +<p><b>IMPORTANT:</b> + It is recommended to set the database's timezone to GMT (consult your database's documentation on how to do this). +Databases don't handle Daylight Saving Time shifts correctly, and may cause problems if you run any Coordinators with actions +scheduled to materialize during the 1 hour period where we "fall back". For Derby, you can add '-Duser.timezone=GMT' +to <tt>JETTY_OPTS</tt> + in oozie-env.sh to set this. Alternatively, if using MySQL, you can have Oozie use GMT with MySQL without +setting MySQL's timezone to GMT by adding 'useLegacyDatetimeCode=false&serverTimezone=GMT' arguments to the JDBC +URL, <tt>oozie.service.JPAService.jdbc.url</tt> +. Be advised that changing the timezone on an existing Oozie database while Coordinators +are already running may cause Coordinators to shift by the offset of their timezone from GMT once after making this change.</p> +<p>The SQL database used by Oozie is configured using the following configuration properties (default values shown):</p> +<p><pre> + oozie.db.schema.name=oozie + oozie.service.JPAService.create.db.schema=false + oozie.service.JPAService.validate.db.connection=false + oozie.service.JPAService.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver + oozie.service.JPAService.jdbc.url=jdbc:derby:${oozie.data.dir}/${oozie.db.schema.name}-db;create=true + oozie.service.JPAService.jdbc.username=sa + oozie.service.JPAService.jdbc.password= + oozie.service.JPAService.pool.max.active.conn=10 +</pre></p> +<p><b>NOTE:</b> + If the <tt>oozie.db.schema.create</tt> + property is set to <tt>true</tt> + (default value is <tt>false</tt> +) the Oozie tables +will be created automatically without having to use the <tt>ooziedb</tt> + command line tool. Setting this property to + <tt>true</tt> + it is recommended only for development.</p> +<p><b>NOTE:</b> + If the <tt>oozie.db.schema.create</tt> + property is set to true, the <tt>oozie.service.JPAService.validate.db.connection</tt> + +property value is ignored and Oozie handles it as set to <tt>false</tt> +.</p> +<p>Once <tt>oozie-site.xml</tt> + has been configured with the database configuration execute the <tt>ooziedb.sh</tt> + command line tool to +create the database:</p> +<p><pre> +$ bin/ooziedb.sh create -sqlfile oozie.sql -runValidate DB Connection. +DONE +Check DB schema does not exist +DONE +Check OOZIE_SYS table does not exist +DONE +Create SQL schema +DONE +DONE +Create OOZIE_SYS table +DONE +Oozie DB has been created for Oozie version '3.2.0' +The SQL commands have been written to: oozie.sql +$ +</pre> +</p> +<p>NOTE: If using MySQL, Oracle, or SQL Server, copy the corresponding JDBC driver JAR file to the <tt>libext/</tt> + directory before running +the <tt>ooziedb.sh</tt> + command line tool.</p> +<p>NOTE: If instead using the '-run' option, the '-sqlfile <FILE>' option is used, then all the +database changes will be written to the specified file and the database won't be modified.</p> +<p>If using HSQL there is no need to use the <tt>ooziedb</tt> + command line tool as HSQL is an in-memory database. Use the +following configuration properties in the oozie-site.xml:</p> +<p><pre> + oozie.db.schema.name=oozie + oozie.service.JPAService.create.db.schema=true + oozie.service.JPAService.validate.db.connection=false + oozie.service.JPAService.jdbc.driver=org.hsqldb.jdbcDriver + oozie.service.JPAService.jdbc.url=jdbc:hsqldb:mem:${oozie.db.schema.name} + oozie.service.JPAService.jdbc.username=sa + oozie.service.JPAService.jdbc.password= + oozie.service.JPAService.pool.max.active.conn=10 +</pre></p> +<p>If you are interested in fine tuning how Oozie can retry database operations on failing database connectivity or errors, you can +set following properties to other values. Here are the default ones:</p> +<p><pre> + oozie.service.JPAService.retry.initial-wait-time.ms=100 + oozie.service.JPAService.retry.maximum-wait-time.ms=30000 + oozie.service.JPAService.retry.max-retries=10 +</pre></p> +<p>If you set either <tt>oozie.service.JPAService.retry.max-retries</tt> + or <tt>oozie.service.JPAService.retry.maximum-wait-time.ms</tt> + to <tt>0</tt> +, +no retry attempts will be made on any database connectivity issues. Exact settings for these properties depend also on how much load +is on Oozie regarding workflow and coordinator jobs.</p> +<p>The database operation retry functionality kicks in when there is a <tt>javax.persistence.PersistenceException</tt> + those root cause is not +part of the normal everyday operation - filtered against a blacklist consisting of descendants like <tt>NoSuchResultException</tt> +, +=NonUniqueResultException=, and the like. This way Oozie won't retry database operations on errors that are more related to the +current query, or otherwise part of the everyday life. This way it's ensured that this blacklist is database agnostic.</p> +<p>It has been tested with a MySQL / failing every minute 10 seconds / an Oozie coordinator job of an Oozie workflow consisting of four +workflow actions (some of them are asynchronous). On this setup Oozie was recovering after each and every database outage.</p> +<p>To set up such a failing MySQL scenario following has to be performed:</p> +<p><ul><li>Set <tt>oozie.service.JPAService.connection.data.source</tt> + to <tt>org.apache.oozie.util.db.BasicDataSourceWrapper</tt> +</li> +</ul> +within <tt>oozie-site.xml</tt> +<ul><li>Set <tt>oozie.service.JPAService.jdbc.driver</tt> + to <tt>org.apache.oozie.util.db.FailingMySQLDriverWrapper</tt> + within <tt>oozie-site.xml</tt> +</li> +<li>Restart Oozie server</li> +<li>Submit / start some workflows, coordinators etc.</li> +<li>See how Oozie is retrying on injected database errors by looking at the Oozie server logs, grepping <tt>JPAException</tt> + instances</li> +</ul> +with following message prefix: <pre>Deliberately failing to prepare statement.</pre></p> +<a name="Database_Migration"></a> +</div> +<div class="section"><h3>Database Migration</h3> +<p>Oozie provides an easy way to switch between databases without losing any data. Oozie servers should be stopped during the +database migration process. +The export of the database can be done using the following command: +<pre> +$ bin/oozie-setup.sh export /tmp/oozie_db.zip +1 rows exported from OOZIE_SYS +50 rows exported from WF_JOBS +340 rows exported from WF_ACTIONS +10 rows exported from COORD_JOBS +70 rows exported from COORD_ACTIONS +0 rows exported from BUNDLE_JOBS +0 rows exported from BUNDLE_ACTIONS +0 rows exported from SLA_REGISTRATION +0 rows exported from SLA_SUMMARY +</pre></p> +<p>The database configuration is read from <tt>oozie-site.xml</tt> +. After updating the configuration to point to the new database, +the tables have to be created with ooziedb.sh in the <a href="./AG_Install.html#Database_Configuration">Database configuration</a> + +section above. +Once the tables are created, they can be filled with data using the following command:</p> +<p><pre> +$ bin/oozie-setup.sh import /tmp/oozie_db.zip +Loading to Oozie database version 3 +50 rows imported to WF_JOBS +340 rows imported to WF_ACTIONS +10 rows imported to COORD_JOBS +70 rows imported to COORD_ACTIONS +0 rows imported to BUNDLE_JOBS +0 rows imported to BUNDLE_ACTIONS +0 rows imported to SLA_REGISTRATION +0 rows imported to SLA_SUMMARY +</pre></p> +<p>NOTE: The database version of the zip must match the version of the Oozie database it's imported to.</p> +<p>After starting the Oozie server, the history and the currently running workflows should be available.</p> +<p><b>IMPORTANT:</b> + The tool was primarily developed to make the migration from embedded databases (e.g. Derby) to standalone databases + (e.g. MySQL, PosgreSQL, Oracle, MS SQL Server), though it will work between any supported databases. +It is <b>not</b> + optimized to handle databases over 1 Gb. If the database size is larger, it should be purged before migration.</p> +<a name="Oozie_Configuration"></a> +</div> +<div class="section"><h3>Oozie Configuration</h3> +<p>By default, Oozie configuration is read from Oozie's <tt>conf/</tt> + directory</p> +<p>The Oozie configuration is distributed in 3 different files:</p> +<p><ul><li><tt>oozie-site.xml</tt> + : Oozie server configuration</li> +<li><tt>oozie-log4j.properties</tt> + : Oozie logging configuration</li> +<li><tt>adminusers.txt</tt> + : Oozie admin users list</li> +</ul> +</p> +<a name="Oozie_Configuration_Properties"></a> +<div class="section"><h4>Oozie Configuration Properties</h4> +<p>All Oozie configuration properties and their default values are defined in the <tt>oozie-default.xml</tt> + file.</p> +<p>Oozie resolves configuration property values in the following order:</p> +<p><ul><li>If a Java System property is defined, it uses its value</li> +<li>Else, if the Oozie configuration file (=oozie-site.xml=) contains the property, it uses its value</li> +<li>Else, it uses the default value documented in the <tt>oozie-default.xml</tt> + file</li> +</ul> +</p> +<p><b>NOTE:</b> + The <tt>oozie-default.xml</tt> + file found in Oozie's <tt>conf/</tt> + directory is not used by Oozie, it is there +for reference purposes only.</p> +<a name="Precedence_of_Configuration_Properties"></a> +</div> +<div class="section"><h4>Precedence of Configuration Properties</h4> +<p>For compatibility reasons across Hadoop / Oozie versions, some configuration properties can be defined using multiple keys +in the launcher configuration. Beginning with Oozie 5.0.0, some of them can be overridden, some others will be prepended to default +configuration values.</p> +<a name="Overriding_Configuration_Values"></a> +<div class="section"><h5>Overriding Configuration Values</h5> +<p>Overriding happens for following configuration entries with <tt>oozie.launcher</tt> + prefix, by switching <tt>oozie.launcher.override</tt> + +(on by default).</p> +<p>For those, following is the general approach:<ul><li>check whether a YARN compatible entry is present. If yes, use it to override default value</li> +<li>check whether a MapReduce v2 compatible entry is present. If yes, use it to override default value</li> +<li>check whether a MapReduce v1 compatible entry is present. If yes, use it to override default value</li> +<li>use default value</li> +</ul> +</p> +<p>Such properties are (legend: YARN / MapReduce v2 / MapReduce v1):<ul><li>max attempts of the MapReduce Application Master:<ul><li>N / A</li> +<li><tt>mapreduce.map.maxattempts</tt> +</li> +<li><tt>mapred.map.max.attempts</tt> +</li> +</ul> +</li> +<li>memory amount in MB of the MapReduce Application Master:<ul><li><tt>yarn.app.mapreduce.am.resource.mb</tt> +</li> +<li><tt>mapreduce.map.memory.mb</tt> +</li> +<li><tt>mapred.job.map.memory.mb</tt> +</li> +</ul> +</li> +<li>CPU vcore count of the MapReduce Application Master:<ul><li><tt>yarn.app.mapreduce.am.resource.cpu-vcores</tt> +</li> +<li><tt>mapreduce.map.cpu.vcores</tt> +</li> +<li>N / A</li> +</ul> +</li> +<li>logging level of the MapReduce Application Master:<ul><li>N / A</li> +<li><tt>mapreduce.map.log.level</tt> +</li> +<li><tt>mapred.map.child.log.level</tt> +</li> +</ul> +</li> +<li>MapReduce Application Master JVM options:<ul><li><tt>yarn.app.mapreduce.am.command-opts</tt> +</li> +<li><tt>mapreduce.map.java.opts</tt> +</li> +<li><tt>mapred.child.java.opts</tt> +</li> +</ul> +</li> +<li>MapReduce Application Master environment variable settings:<ul><li><tt>yarn.app.mapreduce.am.env</tt> +</li> +<li><tt>mapreduce.map.env</tt> +</li> +<li><tt>mapred.child.env</tt> +</li> +</ul> +</li> +<li>MapReduce Application Master job priority:<ul><li>N / A</li> +<li><tt>mapreduce.job.priority</tt> +</li> +<li><tt>mapred.job.priority</tt> +</li> +</ul> +</li> +<li>MapReduce Application Master job queue name:<ul><li>N / A</li> +<li><tt>mapreduce.job.queuename</tt> +</li> +<li><tt>mapred.job.queue.name</tt> +</li> +</ul> +</li> +<li>MapReduce View ACL settings:<ul><li>N / A</li> +<li><tt>mapreduce.job.acl-view-job</tt> +</li> +<li>N / A</li> +</ul> +</li> +<li>MapReduce Modify ACL settings:<ul><li>N / A</li> +<li><tt>mapreduce.job.acl-modify-job</tt> +</li> +<li>N / A</li> +</ul> +</li> +</ul> +</p> +<p>This list can be extended or modified by adding new configuration entries or updating existing values +beginning with <tt>oozie.launcher.override.</tt> + within <tt>oozie-site.xml</tt> +. Examples can be found in <tt>oozie-default.xml</tt> +.</p> +<a name="Prepending_Configuration_Values"></a> +</div> +<div class="section"><h5>Prepending Configuration Values</h5> +<p>Prepending happens for following configuration entries with <tt>oozie.launcher</tt> + prefix, by switching <tt>oozie.launcher.prepend</tt> + +(on by default).</p> +<p>For those, following is the general approach:<ul><li>check whether a YARN compatible entry is present. If yes, use it to prepend to default value</li> +<li>use default value</li> +</ul> +</p> +<p>Such properties are (legend: YARN only):<ul><li>MapReduce Application Master JVM options: <tt>yarn.app.mapreduce.am.admin-command-opts</tt> +</li> +<li>MapReduce Application Master environment settings: <tt>yarn.app.mapreduce.am.admin.user.env</tt> +</li> +</ul> +</p> +<p>This list can be extended or modified by adding new configuration entries or updating existing values +beginning with <tt>oozie.launcher.prepend.</tt> + within <tt>oozie-site.xml</tt> +. Examples can be found in <tt>oozie-default.xml</tt> +.</p> +<a name="Logging_Configuration"></a> +</div> +</div> +<div class="section"><h4>Logging Configuration</h4> +<p>By default, Oozie log configuration is defined in the <tt>oozie-log4j.properties</tt> + configuration file.</p> +<p>If the Oozie log configuration file changes, Oozie reloads the new settings automatically.</p> +<p>By default, Oozie logs to Oozie's <tt>logs/</tt> + directory.</p> +<p>Oozie logs in 4 different files:</p> +<p><ul><li>oozie.log: web services log streaming works from this log</li> +<li>oozie-ops.log: messages for Admin/Operations to monitor</li> +<li>oozie-instrumentation.log: instrumentation data, every 60 seconds (configurable)</li> +<li>oozie-audit.log: audit messages, workflow jobs changes</li> +</ul> +</p> +<p>The embedded Jetty and embedded Derby log files are also written to Oozie's <tt>logs/</tt> + directory.</p> +<a name="Oozie_User_Authentication_Configuration"></a> +</div> +<div class="section"><h4>Oozie User Authentication Configuration</h4> +<p>Oozie supports Kerberos HTTP SPNEGO authentication, pseudo/simple authentication and anonymous access +for client connections.</p> +<p>Anonymous access (*default*) does not require the user to authenticate and the user ID is obtained from +the job properties on job submission operations, other operations are anonymous.</p> +<p>Pseudo/simple authentication requires the user to specify the user name on the request, this is done by +the PseudoAuthenticator class by injecting the <tt>user.name</tt> + parameter in the query string of all requests. +The <tt>user.name</tt> + parameter value is taken from the client process Java System property <tt>user.name</tt> +.</p> +<p>Kerberos HTTP SPNEGO authentication requires the user to perform a Kerberos HTTP SPNEGO authentication sequence.</p> +<p>If Pseudo/simple or Kerberos HTTP SPNEGO authentication mechanisms are used, Oozie will return the user an +authentication token HTTP Cookie that can be used in later requests as identity proof.</p> +<p>Oozie uses Apache Hadoop-Auth (Java HTTP SPNEGO) library for authentication. +This library can be extended to support other authentication mechanisms.</p> +<p>Oozie user authentication is configured using the following configuration properties (default values shown):</p> +<p><pre> + oozie.authentication.type=simple + oozie.authentication.token.validity=36000 + oozie.authentication.signature.secret= + oozie.authentication.cookie.domain= + oozie.authentication.simple.anonymous.allowed=true + oozie.authentication.kerberos.principal=HTTP/localhost@${local.realm} + oozie.authentication.kerberos.keytab=${oozie.service.HadoopAccessorService.keytab.file} +</pre></p> +<p>The <tt>type</tt> + defines authentication used for Oozie HTTP endpoint, the supported values are: +simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#.</p> +<p>The <tt>token.validity</tt> + indicates how long (in seconds) an authentication token is valid before it has +to be renewed.</p> +<p>The <tt>signature.secret</tt> + is the signature secret for signing the authentication tokens. It is recommended to not set this, in which +case Oozie will randomly generate one on startup.</p> +<p>The <tt>oozie.authentication.cookie.domain</tt> + The domain to use for the HTTP cookie that stores the +authentication token. In order to authentication to work correctly across all Hadoop nodes web-consoles +the domain must be correctly set.</p> +<p>The <tt>simple.anonymous.allowed</tt> + indicates if anonymous requests are allowed. This setting is meaningful +only when using 'simple' authentication.</p> +<p>The <tt>kerberos.principal</tt> + indicates the Kerberos principal to be used for HTTP endpoint. +The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification.</p> +<p>The <tt>kerberos.keytab</tt> + indicates the location of the keytab file with the credentials for the principal. +It should be the same keytab file Oozie uses for its Kerberos credentials for Hadoop.</p> +<a name="Oozie_Hadoop_Authentication_Configuration"></a> +</div> +<div class="section"><h4>Oozie Hadoop Authentication Configuration</h4> +<p>Oozie works with Hadoop versions which support Kerberos authentication.</p> +<p>Oozie Hadoop authentication is configured using the following configuration properties (default values shown):</p> +<p><pre> + oozie.service.HadoopAccessorService.kerberos.enabled=false + local.realm=LOCALHOST + oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab + oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm} +</pre></p> +<p>The above default values are for a Hadoop 0.20 secure distribution (with support for Kerberos authentication).</p> +<p>To enable Kerberos authentication, the following property must be set:</p> +<p><pre> + oozie.service.HadoopAccessorService.kerberos.enabled=true +</pre></p> +<p>When using Kerberos authentication, the following properties must be set to the correct values (default values shown):</p> +<p><pre> + local.realm=LOCALHOST + oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab + oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm} +</pre></p> +<p><b>IMPORTANT:</b> + When using Oozie with a Hadoop 20 with Security distribution, the Oozie user in Hadoop must be configured +as a proxy user.</p> +<a name="User_ProxyUser_Configuration"></a> +</div> +<div class="section"><h4>User ProxyUser Configuration</h4> +<p>Oozie supports impersonation or proxyuser functionality (identical to Hadoop proxyuser capabilities and conceptually +similar to Unix 'sudo').</p> +<p>Proxyuser enables other systems that are Oozie clients to submit jobs on behalf of other users.</p> +<p>Because proxyuser is a powerful capability, Oozie provides the following restriction capabilities +(similar to Hadoop):</p> +<p><ul><li>Proxyuser is an explicit configuration on per proxyuser user basis.</li> +<li>A proxyuser user can be restricted to impersonate other users from a set of hosts.</li> +<li>A proxyuser user can be restricted to impersonate users belonging to a set of groups.</li> +</ul> +</p> +<p>There are 2 configuration properties needed to set up a proxyuser:</p> +<p><ul><li>oozie.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where the user #USER# can impersonate other users.</li> +<li>oozie.service.ProxyUserService.proxyuser.#USER#.groups: groups the users being impersonated by user #USER# must belong to.</li> +</ul> +</p> +<p>Both properties support the '*' wildcard as value. Although this is recommended only for testing/development.</p> +<a name="User_Authorization_Configuration"></a> +</div> +<div class="section"><h4>User Authorization Configuration</h4> +<p>Oozie has a basic authorization model:</p> +<p><ul><li>Users have read access to all jobs</li> +<li>Users have write access to their own jobs</li> +<li>Users have write access to jobs based on an Access Control List (list of users and groups)</li> +<li>Users have read access to admin operations</li> +<li>Admin users have write access to all jobs</li> +<li>Admin users have write access to admin operations</li> +</ul> +</p> +<p>If security is disabled all users are admin users.</p> +<p>Oozie security is set via the following configuration property (default value shown):</p> +<p><pre> + oozie.service.AuthorizationService.security.enabled=false +</pre></p> +<p>NOTE: the old ACL model where a group was provided is still supported if the following property is set +in <tt>oozie-site.xml</tt> +:</p> +<p><pre> + oozie.service.AuthorizationService.default.group.as.acl=true +</pre></p> +<p>Admin users are determined from the list of admin groups, specified in + <tt>oozie.service.AuthorizationService.admin.groups</tt> + property. Use commas to separate multiple groups, spaces, tabs +and ENTER characters are trimmed.</p> +<p>If the above property for admin groups is not set, then the admin users are the users specified in the + <tt>conf/adminusers.txt</tt> + file. The syntax of this file is:</p> +<p><ul><li>One user name per line</li> +<li>Empty lines and lines starting with '#' are ignored</li> +</ul> +</p> +<a name="Oozie_System_ID_Configuration"></a> +</div> +<div class="section"><h4>Oozie System ID Configuration</h4> +<p>Oozie has a system ID that is is used to generate the Oozie temporary runtime directory, the workflow job IDs, and the +workflow action IDs.</p> +<p>Two Oozie systems running with the same ID will not have any conflict but in case of troubleshooting it will be easier +to identify resources created/used by the different Oozie systems if they have different system IDs (default value +shown):</p> +<p><pre> + oozie.system.id=oozie-${user.name} +</pre></p> +<a name="Filesystem_Configuration"></a> +</div> +<div class="section"><h4>Filesystem Configuration</h4> +<p>Oozie lets you to configure the allowed Filesystems by using the following configuration property in oozie-site.xml: +<pre> + <property> + <name>oozie.service.HadoopAccessorService.supported.filesystems</name> + <value>hdfs</value> + </property> +</pre></p> +<p>The above value, <tt>hdfs</tt> +, which is the default, means that Oozie will only allow HDFS filesystems to be used. Examples of other +filesystems that Oozie is compatible with are: hdfs, hftp, webhdfs, and viewfs. Multiple filesystems can be specified as +comma-separated values. Putting a * will allow any filesystem type, effectively disabling this check.</p> +<a name="HCatalog_Configuration"></a> +</div> +<div class="section"><h4>HCatalog Configuration</h4> +<p>Refer to the <a href="./DG_HCatalogIntegration.html">Oozie HCatalog Integration</a> + document for a overview of HCatalog and +integration of Oozie with HCatalog. This section explains the various settings to be configured in oozie-site.xml on +the Oozie server to enable Oozie to work with HCatalog.</p> +<p><b>Adding HCatalog jars to Oozie war:</b> +</p> +<p>For Oozie server to talk to HCatalog server, HCatalog and hive jars need to be in the server classpath. +hive-site.xml which has the configuration to talk to the HCatalog server also needs to be in the classpath or specified by the +following configuration property in oozie-site.xml: +<pre> + <property> + <name>oozie.service.HCatAccessorService.hcat.configuration</name> + <value>/local/filesystem/path/to/hive-site.xml</value> + </property> +</pre> +The hive-site.xml can also be placed in a location on HDFS and the above property can have a value +of <tt><a href="./hdfs://HOST:PORT/path/to/hive-site.xml.html">hdfs://HOST:PORT/path/to/hive-site.xml</a> +</tt> + to point there instead of the local file system.</p> +<p>The oozie-[version]-hcataloglibs.tar.gz in the oozie distribution bundles the required hcatalog and hive jars that +needs to be placed in the Oozie server classpath. If using a version of HCatalog bundled in +Oozie hcataloglibs/, copy the corresponding HCatalog jars from hcataloglibs/ to the libext/ directory. If using a +different version of HCatalog, copy the required HCatalog jars from such version in the libext/ directory. +This needs to be done before running the <tt>oozie-setup.sh</tt> + script so that these jars get added for Oozie.</p> +<p><b>Configure HCatalog URI Handling:</b> +</p> +<p><pre> + <property> + <name>oozie.service.URIHandlerService.uri.handlers</name> + <value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value> + <description> + Enlist the different uri handlers supported for data availability checks. + </description> + </property> +</pre></p> +<p>The above configuration defines the different uri handlers which check for existence of data dependencies defined in a +Coordinator. The default value is <tt>org.apache.oozie.dependency.FSURIHandler</tt> +. FSURIHandler supports uris with +schemes defined in the configuration <tt>oozie.service.HadoopAccessorService.supported.filesystems</tt> + which are hdfs, hftp +and webhcat by default. HCatURIHandler supports uris with the scheme as hcat.</p> +<p><b>Configure HCatalog services:</b> +</p> +<p><pre> + <property> + <name>oozie.services.ext</name> + <value> + org.apache.oozie.service.JMSAccessorService, + org.apache.oozie.service.PartitionDependencyManagerService, + org.apache.oozie.service.HCatAccessorService + </value> + <description> + To add/replace services defined in 'oozie.services' with custom implementations. + Class names must be separated by commas. + </description> + </property> +</pre></p> +<p>PartitionDependencyManagerService and HCatAccessorService are required to work with HCatalog and support Coordinators +having HCatalog uris as data dependency. If the HCatalog server is configured to publish partition availability +notifications to a JMS compliant messaging provider like ActiveMQ, then JMSAccessorService needs to be added +to <tt>oozie.services.ext</tt> + to handle those notifications.</p> +<p><b>Configure JMS Provider JNDI connection mapping for HCatalog:</b> +</p> +<p><pre> + <property> + <name>oozie.service.HCatAccessorService.jmsconnections</name> + <value> + hcat://hcatserver.colo1.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.colo1.com:61616, + default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://broker.colo.com:61616;connectionFactoryNames#ConnectionFactory + </value> + <description> + Specify the map of endpoints to JMS configuration properties. In general, endpoint + identifies the HCatalog server URL. "default" is used if no endpoint is mentioned + in the query. If some JMS property is not defined, the system will use the property + defined jndi.properties. jndi.properties files is retrieved from the application classpath. + Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers. + hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616 + </description> + </property> +</pre></p> +<p>Currently HCatalog does not provide APIs to get the connection details to connect to the JMS Provider it publishes +notifications to. It only has APIs which provide the topic name in the JMS Provider to which the notifications are +published for a given database table. So the JMS Provider's connection properties needs to be manually configured +in Oozie using the above setting. You can either provide a <tt>default</tt> + JNDI configuration which will be used as the +JMS Provider for all HCatalog servers, or can specify a configuration per HCatalog server URL or provide a +configuration based on a rule matching multiple HCatalog server URLs. For example: With the configuration of +hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616, +request URL of hcat://server1.colo1.com:8020 will map to tcp://broker.colo1.com:61616, hcat://server2.colo2.com:8020 +will map to tcp://broker.colo2.com:61616 and so on.</p> +<p><b>Configure HCatalog Polling Frequency:</b> +</p> +<p><pre> + <property> + <name>oozie.service.coord.push.check.requeue.interval + </name> + <value>600000</value> + <description>Command re-queue interval for push dependencies (in millisecond). + </description> + </property> +</pre></p> +<p>If there is no JMS Provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined +in <tt>oozie.service.coord.input.check.requeue.interval</tt> +. This config also applies to HDFS polling. +If there is a JMS provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined +in <tt>oozie.service.coord.push.check.requeue.interval</tt> + as a fallback. +The defaults for <tt>oozie.service.coord.input.check.requeue.interval</tt> + and <tt>oozie.service.coord.push.check.requeue.interval</tt> + +are 1 minute and 10 minutes respectively.</p> +<a name="Notifications_Configuration"></a> +</div> +<div class="section"><h4>Notifications Configuration</h4> +<p>Oozie supports publishing notifications to a JMS Provider for job status changes and SLA met and miss events. For +more information on the feature, refer <a href="./DG_JMSNotifications.html">JMS Notifications</a> + documentation. Oozie can also send email +notifications on SLA misses.</p> +<p><ul><li><b>Message Broker Installation</b> +: <br/></li> +</ul> +For Oozie to send/receive messages, a JMS-compliant broker should be installed. Apache ActiveMQ is a popular JMS-compliant +broker usable for this purpose. See <a class="externalLink" href="http://activemq.apache.org/getting-started.html">here</a> + for instructions on +installing and running ActiveMQ.</p> +<p><ul><li><b>Services</b> +: <br/></li> +</ul> +Add/modify <tt>oozie.services.ext</tt> + property in <tt>oozie-site.xml</tt> + to include the following services. + <pre> + <property> + <name>oozie.services.ext</name> + <value> + org.apache.oozie.service.JMSAccessorService, + org.apache.oozie.service.JMSTopicService, + org.apache.oozie.service.EventHandlerService, + org.apache.oozie.sla.service.SLAService + </value> + </property> + </pre></p> +<p><ul><li><b>Event Handlers</b> +: <br/></li> +</ul> +<pre> + <property> + <name>oozie.service.EventHandlerService.event.listeners</name> + <value> + org.apache.oozie.jms.JMSJobEventListener, + org.apache.oozie.sla.listener.SLAJobEventListener, + org.apache.oozie.jms.JMSSLAEventListener, + org.apache.oozie.sla.listener.SLAEmailEventListener + </value> + </property> + </pre> + It is also recommended to increase <tt>oozie.service.SchedulerService.threads</tt> + to 15 for faster event processing and sending notifications. The services and their functions are as follows: <br/> + JMSJobEventListener - Sends JMS job notifications <br/> + JMSSLAEventListener - Sends JMS SLA notifications <br/> + SLAEmailEventListener - Sends Email SLA notifications <br/> + SLAJobEventListener - Processes job events and calculates SLA. Does not send any notifications<ul><li><b>JMS properties</b> +: <br/></li> +</ul> +Add <tt>oozie.jms.producer.connection.properties</tt> + property in <tt>oozie-site.xml</tt> +. Its value corresponds to an +identifier (e.g. default) assigned to a semi-colon separated key#value list of properties from your JMS broker's +=jndi.properties= file. The important properties are <tt>java.naming.factory.initial</tt> + and <tt>java.naming.provider.url</tt> +.</p> +<p>As an example, if using ActiveMQ in local env, the property can be set to + <pre> + <property> + <name>oozie.jms.producer.connection.properties</name> + <value> + java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory + </value> + </property> + </pre><ul><li><b>JMS Topic name</b> +: <br/></li> +</ul> +JMS consumers listen on a particular "topic". Hence Oozie needs to define a topic variable with which to publish messages +about the various jobs. + <pre> + <property> + <name>oozie.service.JMSTopicService.topic.name</name> + <value> + default=${username} + </value> + <description> + Topic options are ${username}, ${jobId}, or a fixed string which can be specified as default or for a + particular job type. + For e.g To have a fixed string topic for workflows, coordinators and bundles, + specify in the following comma-separated format: {jobtype1}={some_string1}, {jobtype2}={some_string2} + where job type can be WORKFLOW, COORDINATOR or BUNDLE. + Following example defines topic for workflow job, workflow action, coordinator job, coordinator action, + bundle job and bundle action + WORKFLOW=workflow, + COORDINATOR=coordinator, + BUNDLE=bundle + For jobs with no defined topic, default topic will be ${username} + </description> + </property> + </pre></p> +<p>Another related property is the topic prefix. + <pre> + <property> + <name>oozie.service.JMSTopicService.topic.prefix</name> + <value></value> + <description> + This can be used to append a prefix to the topic in oozie.service.JMSTopicService.topic.name. For eg: oozie. + </description> + </property> + </pre></p> +<a name="Setting_Up_Oozie_with_HTTPS_SSL"></a> +</div> +<div class="section"><h4>Setting Up Oozie with HTTPS (SSL)</h4> +<p><b>IMPORTANT</b> +: +The default HTTPS configuration will cause all Oozie URLs to use HTTPS except for the JobTracker callback URLs. This is to simplify +configuration (no changes needed outside of Oozie), but this is okay because Oozie doesn't inherently trust the callbacks anyway; +they are used as hints.</p> +<p>The related environment variables are explained at <a href="./AG_Install.html#Environment_Setup">Environment Setup</a> +.</p> +<p>You can use either a certificate from a Certificate Authority or a Self-Signed Certificate. Using a self-signed certificate +requires some additional configuration on each Oozie client machine. If possible, a certificate from a Certificate Authority is +recommended because it's simpler to configure.</p> +<p>There's also some additional considerations when using Oozie HA with HTTPS.</p> +<a name="To_use_a_Self-Signed_Certificate"></a> +<div class="section"><h5>To use a Self-Signed Certificate</h5> +<p>There are many ways to create a Self-Signed Certificate, this is just one way. We will be using +the <a class="externalLink" href="http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/keytool.html">keytool</a> + program, which is +included with your JRE. If it's not on your path, you should be able to find it in $JAVA_HOME/bin.</p> +<p>1. Run the following command (as the Oozie user) to create the keystore file, which will be named <tt>.keystore</tt> + and located in the +Oozie user's home directory. +<pre> +keytool -genkeypair -alias jetty -keyalg RSA -dname "CN=hostname" -storepass password -keypass password +</pre> +The <tt>hostname</tt> + should be the host name of the Oozie Server or a wildcard on the subdomain it belongs to. Make sure to include +the "CN=" part. You can change <tt>storepass</tt> + and <tt>keypass</tt> + values, but they should be the same. If you do want to use something +other than password, you'll also need to change the value of the <tt>oozie.https.keystore.pass</tt> + property in <tt>oozie-site.xml</tt> + to +match; <tt>password</tt> + is the default.</p> +<p>For example, if your Oozie server was at oozie.int.example.com, then you would do this: +<pre> +keytool -genkeypair -alias jetty -keyalg RSA -dname "CN=oozie.int.example.com" -storepass password -keypass password +</pre> +If you're going to be using Oozie HA, it's simplest if you have a single certificate that all Oozie servers in the HA group can use. +To do that, you'll need to use a wildcard on the subdomain it belongs to: +<pre> +keytool -genkeypair -alias jetty -keyalg RSA -dname "CN=*.int.example.com" -storepass password -keypass password +</pre> +The above would work on any server in the int.example.com domain.</p> +<p>2. Run the following command (as the Oozie user) to export a certificate file from the keystore file: +<pre> +keytool -exportcert -alias jetty -file path/to/anywhere/certificate.cert -storepass password +</pre></p> +<p>3. Run the following command (as any user) to create a truststore containing the certificate we just exported: +<pre> +keytool -import -alias jetty -file path/to/certificate.cert -keystore /path/to/anywhere/oozie.truststore -storepass password2 +</pre> +You'll need the <tt>oozie.truststore</tt> + later if you're using the Oozie client (or other Java-based client); otherwise, you can skip +this step. The <tt>storepass</tt> + value here is only used to verify or change the truststore and isn't typically required when only +reading from it; so it does not have to be given to users only using the client.</p> +<a name="To_use_a_Certificate_from_a_Certificate_Authority"></a> +</div> +<div class="section"><h5>To use a Certificate from a Certificate Authority</h5> +<p>1. You will need to make a request to a Certificate Authority in order to obtain a proper Certificate; please consult a Certificate +Authority on this procedure. If you're going to be using Oozie HA, it's simplest if you have a single certificate that all Oozie +servers in the HA group can use. To do that, you'll need to use a wild on the subdomain it belongs to (e.g. "*.int.example.com").</p> +<p>2. Once you have your .cert file, run the following command (as the Oozie user) to create a keystore file from your certificate: +<pre> +keytool -import -alias jetty -file path/to/certificate.cert +</pre> +The keystore file will be named <tt>.keystore</tt> + and located in the Oozie user's home directory.</p> +<a name="Configure_the_Oozie_Server_to_use_SSL_HTTPS"></a> +</div> +<div class="section"><h5>Configure the Oozie Server to use SSL (HTTPS)</h5> +<p>1. Make sure the Oozie server isn't running</p> +<p>2. Configure settings necessary for enabling SSL/TLS support in <tt>oozie-site.xml</tt> +.</p> +<p>2a. Set <tt>oozie.https.enabled</tt> + to <tt>true</tt> +. To revert back to HTTP, set <tt>oozie.https.enabled</tt> + to <tt>false</tt> +. +2b. Set location and password for the keystore and location for truststore by setting <tt>oozie.https.keystore.file</tt> +, +=oozie.https.keystore.pass=, <tt>oozie.https.truststore.file</tt> +.</p> +<p><b>Note:</b> + <tt>oozie.https.truststore.file</tt> + can be overridden by setting <tt>javax.net.ssl.trustStore</tt> + system property.</p> +<p>The default HTTPS port Oozie listens on for secure connections is 11443; it can be changed via <tt>oozie.https.port</tt> +.</p> +<p>It is possible to specify other HTTPS settings via <tt>oozie-site.xml</tt> +: +- To include / exclude cipher suites, set <tt>oozie.https.include.cipher.suites</tt> + / <tt>oozie.https.exclude.cipher.suites</tt> +. +- To include / exclude TLS protocols, set <tt>oozie.https.include.protocols</tt> + / <tt>oozie.https.exclude.protocols</tt> +. +*Note:* Exclude is always preferred over include (i.e. if you both include and exclude an entity, it will be excluded).</p> +<p>3. Start the Oozie server</p> +<p><b>Note:</b> + If using Oozie HA, make sure that each Oozie server has a copy of the .keystore file.</p> +<a name="Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS"></a> +</div> +<div class="section"><h5>Configure the Oozie Client to connect using SSL (HTTPS)</h5> +<p>The first two steps are only necessary if you are using a Self-Signed Certificate; the third is required either way. +Also, these steps must be done on every machine where you intend to use the Oozie Client.</p> +<p>1. Copy or download the oozie.truststore file onto the client machine</p> +<p>2. When using any Java-based program, you'll need to pass <tt>-Djavax.net.ssl.trustStore</tt> + to the JVM. To +do this for the Oozie client: +<pre> +export OOZIE_CLIENT_OPTS='-Djavax.net.ssl.trustStore=/path/to/oozie.truststore' +</pre></p> +<p>3. When using the Oozie Client, you will need to use <a class="externalLink" href="https://oozie.server.hostname:11443/oozie">https://oozie.server.hostname:11443/oozie</a> + instead of +http://oozie.server.hostname:11000/oozie -- Java will not automatically redirect from the http address to the https address.</p> +<a name="Connect_to_the_Oozie_Web_UI_using_SSL_HTTPS"></a> +</div> +<div class="section"><h5>Connect to the Oozie Web UI using SSL (HTTPS)</h5> +<p>1. Use https://oozie.server.hostname:11443/oozie +though most browsers should automatically redirect you if you use http://oozie.server.hostname:11000/oozie</p> +<p><b>IMPORTANT</b> +: If using a Self-Signed Certificate, your browser will warn you that it can't verify the certificate or something +similar. You will probably have to add your certificate as an exception.</p> +<a name="Additional_considerations_for_Oozie_HA_with_SSL"></a> +</div> +<div class="section"><h5>Additional considerations for Oozie HA with SSL</h5> +<p>You'll need to configure the load balancer to do SSL pass-through. This will allow the clients talking to Oozie to use the +SSL certificate provided by the Oozie servers (so the load balancer does not need one). Please consult your load balancer's +documentation on how to configure this. Make sure to point the load balancer at the <a class="externalLink" href="https://HOST:HTTPS_PORT">https://HOST:HTTPS_PORT</a> + addresses for your +Oozie servers. Clients can then connect to the load balancer at <a class="externalLink" href="https://LOAD_BALANCER_HOST:PORT.">https://LOAD_BALANCER_HOST:PORT.</a> +</p> +<p><b>Important:</b> + Callbacks from the ApplicationMaster are done via http or https depending on what you enter for the +=OOZIE_BASE_URL= property. If you are using a Certificate from a Certificate Authority, you can simply put the https address here. +If you are using a self-signed certificate, you have to do one of the following options (Option 1 is recommended):</p> +<p>Option 1) You'll need to follow the steps in +the <a href="./AG_Install.html#Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS">Configure the Oozie Client to connect using SSL (HTTPS)</a> + +section, but on the host of the ApplicationMaster. You can then set <tt>OOZIE_BASE_URL</tt> + to the load balancer https address. +This will allow the ApplicationMaster to contact the Oozie server with https (like the Oozie client, they are also Java +programs).</p> +<p>Option 2) You'll need setup another load balancer, or another "pool" on the existing load balancer, with the http addresses of the +Oozie servers. You can then set <tt>OOZIE_BASE_URL</tt> + to the load balancer http address. Clients should use the https load balancer +address. This will allow clients to use https while the ApplicationMaster uses http for callbacks.</p> +<a name="Fine_Tuning_an_Oozie_Server"></a> +</div> +</div> +<div class="section"><h4>Fine Tuning an Oozie Server</h4> +<p>Refer to the <a href="./oozie-default.xml">oozie-default.xml</a> + for details.</p> +<a name="Using_Instrumentation_instead_of_Metrics"></a> +</div> +<div class="section"><h4>Using Instrumentation instead of Metrics</h4> +<p>As of version 4.1.0, Oozie includes a replacement for the Instrumentation based on Codahale's Metrics library. It includes a +number of improvements over the original Instrumentation included in Oozie. They both report most of the same information, though +the formatting is slightly different and there's some additional information in the Metrics version; the format of the output to the +oozie-instrumentation log is also different.</p> +<p>As of version 5.0.0, <tt>MetricsInstrumentationService</tt> + is the default one, it's enlisted in <tt>oozie.services</tt> +: + <pre> + <property> + <name>oozie.services</name> + <value> + ... + org.apache.oozie.service.MetricsInstrumentationService, + ... + </value> + </property> + </pre></p> +<p>The deprecated <tt>InstrumentationService</tt> + can be enabled by adding <tt>InstrumentationService</tt> + reference to the list of +=oozie.services.ext=: + <pre> + <property> + <name>oozie.services.ext</name> + <value> + ... + org.apache.oozie.service.InstrumentationService, + ... + </value> + </property> + </pre></p> +<p>By default the <tt>admin/instrumentation</tt> + REST endpoint is no longer be available and instead the <tt>admin/metrics</tt> + endpoint can +be used (see the <a href="./WebServicesAPI.html#Oozie_Metrics">Web Services API</a> + documentation for more details); the Oozie Web UI also replaces +the "Instrumentation" tab with a "Metrics" tab.</p> +<p>If the deprecated <tt>InstrumentationService</tt> + is used, the <tt>admin/instrumentation</tt> + REST endpoint gets enabled, the <tt>admin/metrics</tt> + +REST endpoint is no longer available (see the <a href="./WebServicesAPI.html#Oozie_Metrics">Web Services API</a> + documentation for more details); +the Oozie Web UI also replaces the "Metrics" tab with the "Instrumentation" tab.</p> +<p>We can also publish the instrumentation metrics to the external server graphite or ganglia. For this the following +properties should be specified in oozie-site.xml : + <pre> + <property> + <name>oozie.external_monitoring.enable</name> + <value>false</value> + <description> + If the oozie functional metrics needs to be exposed to the metrics-server backend, set it to true + If set to true, the following properties has to be specified : oozie.metrics.server.name, + oozie.metrics.host, oozie.metrics.prefix, oozie.metrics.report.interval.sec, oozie.metrics.port + </description> + </property> <property> + <name>oozie.external_monitoring.type</name> + <value>graphite</value> + <description> + The name of the server to which we want to send the metrics, would be graphite or ganglia. + </description> + </property> + <property> + <name>oozie.external_monitoring.address</name> + <value>http://localhost:2020</value> + </property> + <property> + <name>oozie.external_monitoring.metricPrefix</name> + <value>oozie</value> + </property> + <property> + <name>oozie.external_monitoring.reporterIntervalSecs</name> + <value>60</value> + </property> + </pre> +</p> +<p>We can also publish the instrumentation metrics via JMX interface. For this the following property should be specified +in oozie-site.xml : + <pre> + <property> + <name>oozie.jmx_monitoring.enable</name> + <value>false</value> + <description> + If the oozie functional metrics needs to be exposed via JMX interface, set it to true. + </description> + </property>> + </pre></p> +<p><a name="HA"></a> +</p> +<a name="High_Availability_HA"></a> +</div> +<div class="section"><h4>High Availability (HA)</h4> +<p>Multiple Oozie Servers can be configured against the same database to provide High Availability (HA) of the Oozie service.</p> +<a name="Pre-requisites"></a> +<div class="section"><h5>Pre-requisites</h5> +<p>1. A database that supports multiple concurrent connections. In order to have full HA, the database should also have HA support, or +it becomes a single point of failure.</p> +<p><b>NOTE:</b> + The default derby database does not support this</p> +<p>2. A ZooKeeper ensemble.</p> +<p>Apache ZooKeeper is a distributed, open-source coordination service for distributed applications; the Oozie servers use it for +coordinating access to the database and communicating with each other. In order to have full HA, there should be at least 3 +ZooKeeper servers. +More information on ZooKeeper can be found <a class="externalLink" href="http://zookeeper.apache.org">here</a> +.</p> +<p>3. Multiple Oozie servers.</p> +<p><b>IMPORTANT:</b> + While not strictly required for all configuration properties, all of the servers should ideally have exactly the same +configuration for consistency's sake.</p> +<p>4. A Loadbalancer, Virtual IP, or Round-Robin DNS.</p> +<p>This is used to provide a single entry-point for users and for callbacks from the JobTracker/ResourceManager. The load balancer +should be configured for round-robin between the Oozie servers to distribute the requests. Users (using either the Oozie client, a +web browser, or the REST API) should connect through the load balancer. In order to have full HA, the load balancer should also +have HA support, or it becomes a single point of failure.</p> +<a name="InstallationConfiguration_Steps"></a> +</div> +<div class="section"><h5>Installation/Configuration Steps</h5> +<p>1. Install identically configured Oozie servers normally. Make sure they are all configured against the same database and make sure +that you DO NOT start them yet.</p> +<p>2. Add the following services to the extension services configuration property in oozie-site.xml in all Oozie servers. This will +make Oozie use the ZooKeeper versions of these services instead of the default implementations.</p> +<p><pre> +<property> + <name>oozie.services.ext</name> + <value> + org.apache.oozie.service.ZKLocksService, + org.apache.oozie.service.ZKXLogStreamingService, + org.apache.oozie.service.ZKJobsConcurrencyService, + org.apache.oozie.service.ZKUUIDService + </value> +</property> +</pre></p> +<p>3. Add the following property to oozie-site.xml in all Oozie servers. It should be a comma-separated list of host:port pairs of the +ZooKeeper servers. The default value is shown below.</p> +<p><pre> +<property> + <name>oozie.zookeeper.connection.string</name> + <value>localhost:2181</value> +</property> +</pre></p> +<p>4. (Optional) Add the following property to oozie-site.xml in all Oozie servers to specify the namespace to use. All of the Oozie +Servers that are planning on talking to each other should have the same namespace. If there are multiple Oozie setups each doing +their own HA, they should have their own namespace. The default value is shown below.</p> +<p><pre> +<property> + <name>oozie.zookeeper.namespace</name> + <value>oozie</value> +</property> +</pre></p> +<p>5. Change the value of <tt>OOZIE_BASE_URL</tt> + in oozie-site.xml to point to the loadbalancer or virtual IP, for example:</p> +<p><pre> +<property> + <name>oozie.base.url</name> + <value>http://my.loadbalancer.hostname:11000/oozie</value> +</property> +</pre></p> +<p>6. (Optional) If using a secure cluster, see <a href="./AG_Install.html#Security">Security</a> + below on configuring Kerberos with Oozie HA.</p> +<p>7. Start the ZooKeeper servers.</p> +<p>8. Start the Oozie servers.</p> +<p>Note: If one of the Oozie servers becomes unavailable, querying Oozie for the logs from a job in the Web UI, REST API, or client may +be missing information until that server comes back up.</p> +<a name="Security"></a> +</div> +<div class="section"><h5>Security</h5> +<p>Oozie HA works with the existing Oozie security framework and settings. For HA features (log streaming, share lib, etc) to work +properly in a secure setup, following property can be set on each server. If <tt>oozie.server.authentication.type</tt> + is not set, then +server-server authentication will fall back on <tt>oozie.authentication.type</tt> +.</p> +<p><pre> +<property> + <name>oozie.server.authentication.type</name> + <value>kerberos</value> +</property> +</pre></p> +<p>Below are some additional steps and information specific to Oozie HA:</p> +<p>1. (Optional) To prevent unauthorized users or programs from interacting with or reading the znodes used by Oozie in ZooKeeper, +you can tell Oozie to use Kerberos-backed ACLs. To enforce this for all of the Oozie-related znodes, simply add the following +property to oozie-site.xml in all Oozie servers and set it to <tt>true</tt> +. The default is <tt>false</tt> +.</p> +<p><pre> +<property> + <name>oozie.zookeeper.secure</name> + <value>true</value> +</property> +</pre></p> +<p>Note: The Kerberos principals of each of the Oozie servers should have the same primary name (i.e. in <tt>primary/instance@REALM</tt> +, each +server should have the same value for <tt>primary</tt> +).</p> +<p><b>Important:</b> + Once this property is set to <tt>true</tt> +, it will set the ACLs on all existing Oozie-related znodes to only allow Kerberos +authenticated users with a principal that has the same primary as described above (also for any subsequently created new znodes). +This means that if you ever want to turn this feature off, you will have to manually connect to ZooKeeper using a Kerberos principal +with the same primary and either delete all znodes under and including the namespace (i.e. if <tt>oozie.zookeeper.namespace</tt> + <tt> =oozie</tt> + +then that would be <tt>/oozie</tt> +); alternatively, instead of deleting them all, you can manually set all of their ACLs to <tt>world:anyone</tt> +. +In either case, make sure that no Oozie servers are running while this is being done.</p> +<p>Also, in your zoo.cfg for ZooKeeper, make sure to set the following properties: +<pre> +authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider +kerberos.removeHostFromPrincipal=true +kerberos.removeRealmFromPrincipal=true +</pre></p> +<p>2. Until Hadoop 2.5.0 and later, there is a known limitation where each Oozie server can only use one HTTP principal. However, +for Oozie HA, we need to use two HTTP principals: <tt>HTTP/oozie-server-host@realm</tt> + and <tt>HTTP/load-balancer-host@realm</tt> +. This +allows access to each Oozie server directly and through the load balancer. While users should always go through the load balancer, +certain features (e.g. log streaming) require the Oozie servers to talk to each other directly; it can also be helpful for an +administrator to talk directly to an Oozie server. So, if using a Hadoop version prior to 2.5.0, you will have to choose which +HTTP principal to use as you cannot use both; it is recommended to choose <tt>HTTP/load-balancer-host@realm</tt> + so users can connect +through the load balancer. This will prevent Oozie servers from talking to each other directly, which will effectively disable +log streaming.</p> +<p>For Hadoop 2.5.0 and later:</p> +<p>2a. When creating the keytab used by Oozie, make sure to include Oozie's principal and the two HTTP principals mentioned above.</p> +<p>2b. Set <tt>oozie.authentication.kerberos.principal</tt> + to * (that is, an asterisks) so it will use both HTTP principals.</p> +<p>For earlier versions of Hadoop:</p> +<p>2a. When creating the keytab used by Oozie, make sure to include Oozie's principal and the load balancer HTTP principal</p> +<p>2b. Set <tt>oozie.authentication.kerberos.principal</tt> + to <tt>HTTP/load-balancer-host@realm</tt> +.</p> +<p>3. With Hadoop 2.6.0 and later, a rolling random secret that is synchronized across all Oozie servers will be used for signing the +Oozie auth tokens. This is done automatically when HA is enabled; no additional configuration is needed.</p> +<p>For earlier versions of Hadoop, each server will have a different random secret. This will still work but will likely result in +additional calls to the KDC to authenticate users to the Oozie server (because the auth tokens will not be accepted by other +servers, which will cause a fallback to Kerberos).</p> +<p>4. If you'd like to use HTTPS (SSL) with Oozie HA, there's some additional considerations that need to be made. +See the <a href="./AG_Install.html#Setting_Up_Oozie_with_HTTPS_SSL">Setting Up Oozie with HTTPS (SSL)</a> + section for more information.</p> +<a name="JobId_sequence"></a> +</div> +<div class="section"><h5>JobId sequence</h5> +<p>Oozie in HA mode, uses ZK to generate job id sequence. Job Ids are of following format. +<Id sequence>-<yyMMddHHmmss(server start time)>-<system_id>-<W/C/B></p> +<p>Where, <systemId> is configured as <tt>oozie.system.id</tt> + (default is "oozie-" + "user.name") +W/C/B is suffix to job id indicating that generated job is a type of workflow or coordinator or bundle.</p> +<p>Maximum allowed character for job id sequence is 40. "Id sequence" is stored in ZK and reset to 0 once maximum job id sequence is +reached. Maximum job id sequence is configured as <tt>oozie.service.ZKUUIDService.jobid.sequence.max</tt> +, default value is 99999999990.</p> +<p><pre> +<property> + <name>oozie.service.ZKUUIDService.jobid.sequence.max</name> + <value>99999999990</value> +</property> +</pre></p> +<a name="Starting_and_Stopping_Oozie"></a> +</div> +</div> +</div> +<div class="section"><h3>Starting and Stopping Oozie</h3> +<p>Use the standard commands to start and stop Oozie.</p> +<a name="Oozie_Command_Line_Installation"></a> +</div> +<div class="section"><h3>Oozie Command Line Installation</h3> +<p>Copy and expand the <tt>oozie-client</tt> + TAR.GZ file bundled with the distribution. Add the <tt>bin/</tt> + directory to the <tt>PATH</tt> +.</p> +<p>Refer to the <a href="./DG_CommandLineTool.html">Command Line Interface Utilities</a> + document for a full reference of the <tt>oozie</tt> + +command line tool.</p> +<a name="Oozie_Share_Lib"></a> +</div> +<div class="section"><h3>Oozie Share Lib</h3> +<p>The Oozie sharelib TAR.GZ file bundled with the distribution contains the necessary files to run Oozie map-reduce streaming, pig, +hive, sqooop, and distcp actions. There is also a sharelib for HCatalog. The sharelib is required for these actions to work; any +other actions (mapreduce, shell, ssh, and java) do not require the sharelib to be installed.</p> +<p>As of Oozie 4.0, the following property is included. If true, Oozie will create and ship a "launcher jar" to hdfs that contains +classes necessary for the launcher job. If false, Oozie will not do this, and it is assumed that the necessary classes are in their +respective sharelib jars or the "oozie" sharelib instead. When false, the sharelib is required for ALL actions; when true, the +sharelib is only required for actions that need additional jars (the original list from above).</p> +<p><pre> +<property> + <name>oozie.action.ship.launcher.jar</name> + <value>true</value> +</property> +</pre></p> +<p>Using sharelib CLI, sharelib files are copied to new lib_<timestamped> directory. At start, server picks the sharelib from latest +time-stamp directory. While starting, server also purges sharelib directory which are older than sharelib retention days +(defined as oozie.service.ShareLibService.temp.sharelib.retention.days and 7 days is default).</p> +<p>Sharelib mapping file can be also configured. Configured file is a key value mapping, where key will be the sharelib name for the +action and value is a comma separated list of DFS or local filesystem directories or jar files. Local filesystem refers to the local +filesystem of the node where the Oozie launcher is running. This can be configured in oozie-site.xml as : + <pre> + <!-- OOZIE --> + <property> + <name>oozie.service.ShareLibService.mapping.file</name> + <value></value> + <description> + Sharelib mapping files contains list of key=value, + where key will be the sharelib name for the action and value is a comma separated list of + DFS or local filesystem directories or jar files. + Example. + oozie.pig_10=hdfs:///share/lib/pig/pig-0.10.1/lib/ + oozie.pig=hdfs:///share/lib/pig/pig-0.11.1/lib/ + oozie.distcp=hdfs:///share/lib/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-distcp-2.2.0.jar + oozie.spark=hdfs:///share/lib/spark/lib/,hdfs:///share/lib/spark/python/lib/pyspark.zip,hdfs:///share/lib/spark/python/lib/py4j-0-9-src.zip + oozie.hive=file:///usr/local/oozie/share/lib/hive/ + </description> + </property> + </pre></p> +<p>Example mapping file with local filesystem resources:</p> +<p><pre> + <property> + <name>oozie.service.ShareLibService.mapping.file</name> + <value> + oozie.distcp=file:///usr/local/oozie/share/lib/distcp + oozie.hcatalog=file:///usr/local/oozie/share/lib/hcatalog + oozie.hive=file:///usr/local/oozie/share/lib/hive + oozie.hive2=file:///usr/local/oozie/share/lib/hive2 + oozie.mapreduce-streaming=file:///usr/local/oozie/share/lib/mapreduce-streaming + oozie.oozie=file://usr/local/oozie/share/lib/oozie + oozie.pig=file:///usr/local/oozie/share/lib/pig + oozie.spark=file:///usr/local/oozie/share/lib/spark + oozie.sqoop=file:///usr/localoozie/share/lib/sqoop + </value> + </property> + </pre></p> +<p>If you are using local filesystem resources in the mapping file, make sure corresponding jars are already deployed to +all the nodes where Oozie launcher jobs will be executed, and the files are readable by the launchers. To do this, you +can extract Oozie sharelib TAR.GZ file in the directory of your choice on the nodes, and set permission of the files.</p> +<p>Oozie sharelib TAR.GZ file bundled with the distribution does not contain pyspark and py4j zip files since they vary +with Apache Spark version. Therefore, to run pySpark using Spark Action, user need to specify pyspark and py4j zip +files. These files can be added either to workflow's lib/ directory, to the sharelib or in sharelib mapping file.</p> +<a name="Oozie_CoordinatorsBundles_Processing_Timezone"></a> +</div> +<div class="section"><h3>Oozie Coordinators/Bundles Processing Timezone</h3> +<p>By default Oozie runs coordinator and bundle jobs using <tt>UTC</tt> + timezone for datetime values specified in the application +XML and in the job parameter properties. This includes coordinator applications start and end times of jobs, coordinator +datasets initial-instance, and bundle applications kickoff times. In addition, coordinator dataset instance URI templates +will be resolved using datetime values of the Oozie processing timezone.</p> +<p>It is possible to set the Oozie processing timezone to a timezone that is an offset of UTC, alternate timezones must +expressed in using a GMT offset ( <tt>GMT+/-####</tt> + ). For example: <tt>GMT+0530</tt> + (India timezone).</p> +<p>To change the default <tt>UTC</tt> + timezone, use the <tt>oozie.processing.timezone</tt> + property in the <tt>oozie-site.xml</tt> +. For example:</p> +<p><pre> +<configuration> + <property> + <name>oozie.processing.timezone</name> + <value>GMT+0530</value> + </property> +</configuration> +</pre></p> +<p><b>IMPORTANT:</b> + If using a processing timezone other than <tt>UTC</tt> +, all datetime values in coordinator and bundle jobs must +be expressed in the corresponding timezone, for example <tt>2012-08-08T12:42+0530</tt> +.</p> +<p><b>NOTE:</b> + It is strongly encouraged to use <tt>UTC</tt> +, the default Oozie processing timezone.</p> +<p>For more details on using an alternate Oozie processing timezone, please refer to the +<a href="./CoordinatorFunctionalSpec.html#datetime">Coordinator Functional Specification, section '4. Datetime'</a> +</p> +<p><a name="UberJar"></a> +</p> +<a name="MapReduce_Workflow_Uber_Jars"></a> +</div> +<div class="section"><h3>MapReduce Workflow Uber Jars</h3> +<p>For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be included via an uber jar. An uber jar is a +jar file that contains additional jar files within a "lib" folder (see +<a href="./WorkflowFunctionalSpec.html#AppDeployment">Workflow Functional Specification</a> + for more information). Submitting a workflow with an uber jar +requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow is disabled by default. To enable this feature, use +the <tt>oozie.action.mapreduce.uber.jar.enable</tt> + property in the <tt>oozie-site.xml</tt> + (and make sure to use a supported version of Hadoop).</p> +<p><pre> +<configuration> + <property> + <name>oozie.action.mapreduce.uber.jar.enable</name> + <value>true</value> + </property> +</configuration> +</pre></p> +<a name="AdvancedCustom_Environment_Settings"></a> +</div> +<div class="section"><h3>Advanced/Custom Environment Settings</h3> +<p>Oozie can be configured to use Unix standard filesystem hierarchy for its different files +(configuration, logs, data and temporary files).</p> +<p>These settings must be done in the <tt>bin/oozie-env.sh</tt> + script.</p> +<p>This script is sourced before the configuration <tt>oozie-env.sh</tt> + and supports additional +environment variables (shown with their default values):</p> +<p><pre> +export OOZIE_CONFIG=${OOZIE_HOME}/conf +export OOZIE_DATA={OOZIE_HOME}/data +export OOZIE_LOG={OOZIE_HOME}/logs +export JETTY_OUT=${OOZIE_LOGS}/jetty.out +export JETTY_PID=/tmp/oozie.pid +</pre></p> +<p>Sample values to make Oozie follow Unix standard filesystem hierarchy:</p> +<p><pre> +export OOZIE_CONFIG=/etc/oozie +export OOZIE_DATA=/var/lib/oozie +export OOZIE_LOG=/var/log/oozie +export JETTY_PID=/tmp/oozie.pid +</pre></p> +<p><a href="./index.html">::Go back to Oozie Documentation Index::</a> +</p> +<p></p> +</div> + + </div> + </div> + </div> + + <hr/> + + <footer> + <div class="container-fluid"> + <div class="row-fluid"> + <p >Copyright © 2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All rights reserved. + + </p> + </div> + + + </div> + </footer> + </body> +</html>
