http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SparkActionExtension.md ---------------------------------------------------------------------- diff --git a/docs/src/site/markdown/DG_SparkActionExtension.md b/docs/src/site/markdown/DG_SparkActionExtension.md new file mode 100644 index 0000000..5a56cca --- /dev/null +++ b/docs/src/site/markdown/DG_SparkActionExtension.md @@ -0,0 +1,436 @@ + + +[::Go back to Oozie Documentation Index::](index.html) + +----- + +# Oozie Spark Action Extension + +<!-- MACRO{toc|fromDepth=1|toDepth=4} --> + +## Spark Action + +The `spark` action runs a Spark job. + +The workflow job will wait until the Spark job completes before +continuing to the next action. + +To run the Spark job, you have to configure the `spark` action with +the `resource-manager`, `name-node`, Spark `master` elements as +well as the necessary elements, arguments and configuration. + +Spark options can be specified in an element called `spark-opts`. + +A `spark` action can be configured to create or delete HDFS directories +before starting the Spark job. + +Oozie EL expressions can be used in the inline configuration. Property +values specified in the `configuration` element override values specified +in the `job-xml` file. + +**Syntax:** + + +``` +<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="[NODE-NAME]"> + <spark xmlns="uri:oozie:spark-action:1.0"> + <resource-manager>[RESOURCE-MANAGER]</resource-manager> + <name-node>[NAME-NODE]</name-node> + <prepare> + <delete path="[PATH]"/> + ... + <mkdir path="[PATH]"/> + ... + </prepare> + <job-xml>[SPARK SETTINGS FILE]</job-xml> + <configuration> + <property> + <name>[PROPERTY-NAME]</name> + <value>[PROPERTY-VALUE]</value> + </property> + ... + </configuration> + <master>[SPARK MASTER URL]</master> + <mode>[SPARK MODE]</mode> + <name>[SPARK JOB NAME]</name> + <class>[SPARK MAIN CLASS]</class> + <jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar> + <spark-opts>[SPARK-OPTIONS]</spark-opts> + <arg>[ARG-VALUE]</arg> + ... + <arg>[ARG-VALUE]</arg> + ... + </spark> + <ok to="[NODE-NAME]"/> + <error to="[NODE-NAME]"/> + </action> + ... +</workflow-app> +``` + +The `prepare` element, if present, indicates a list of paths to delete +or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`. + +The `job-xml` element, if present, specifies a file containing configuration +for the Spark job. Multiple `job-xml` elements are allowed in order to +specify multiple `job.xml` files. + +The `configuration` element, if present, contains configuration +properties that are passed to the Spark job. + +The `master` element indicates the url of the Spark Master. Ex: `spark://host:port`, `mesos://host:port`, yarn-cluster, yarn-client, +or local. + +The `mode` element if present indicates the mode of spark, where to run spark driver program. Ex: client,cluster. This is typically +not required because you can specify it as part of `master` (i.e. master`yarn, mode`client is equivalent to master=yarn-client). +A local `master` always runs in client mode. + +Depending on the `master` (and `mode`) entered, the Spark job will run differently as follows: + + * local mode: everything runs here in the Launcher Job. + * yarn-client mode: the driver runs here in the Launcher Job and the executor in Yarn. + * yarn-cluster mode: the driver and executor run in Yarn. + +The `name` element indicates the name of the spark application. + +The `class` element if present, indicates the spark's application main class. + +The `jar` element indicates a comma separated list of jars or python files. + +The `spark-opts` element, if present, contains a list of Spark options that can be passed to Spark. Spark configuration +options can be passed by specifying '--conf key=value' or other Spark CLI options. +Values containing whitespaces can be enclosed by double quotes. + +Some examples of the `spark-opts` element: + + * '--conf key=value' + * '--conf key1=value1 value2' + * '--conf key1="value1 value2"' + * '--conf key1=value1 key2="value2 value3"' + * '--conf key=value --verbose --properties-file user.properties' + +There are several ways to define properties that will be passed to Spark. They are processed in the following order: + + * propagated from `oozie.service.SparkConfigurationService.spark.configurations` + * read from a localized `spark-defaults.conf` file + * read from a file defined in `spark-opts` via the `--properties-file` + * properties defined in `spark-opts` element + +(The latter takes precedence over the former.) +The server propagated properties, the `spark-defaults.conf` and the user-defined properties file are merged together into a +single properties file as Spark handles only one file in its `--properties-file` option. + +The `arg` element if present, contains arguments that can be passed to spark application. + +In case some property values are present both in `spark-defaults.conf` and as property key/value pairs generated by Oozie, the user +configured values from `spark-defaults.conf` are prepended to the ones generated by Oozie, as part of the Spark arguments list. + +Following properties to prepend to Spark arguments: + + * `spark.executor.extraClassPath` + * `spark.driver.extraClassPath` + * `spark.executor.extraJavaOptions` + * `spark.driver.extraJavaOptions` + +All the above elements can be parameterized (templatized) using EL +expressions. + +**Example:** + + +``` +<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="myfirstsparkjob"> + <spark xmlns="uri:oozie:spark-action:1.0"> + <resource-manager>foo:8032</resource-manager> + <name-node>bar:8020</name-node> + <prepare> + <delete path="${jobOutput}"/> + </prepare> + <configuration> + <property> + <name>mapred.compress.map.output</name> + <value>true</value> + </property> + </configuration> + <master>local[*]</master> + <mode>client</mode> + <name>Spark Example</name> + <class>org.apache.spark.examples.mllib.JavaALS</class> + <jar>/lib/spark-examples_2.10-1.1.0.jar</jar> + <spark-opts>--executor-memory 20G --num-executors 50 + --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts> + <arg>inputpath=hdfs://localhost/input/file.txt</arg> + <arg>value=2</arg> + </spark> + <ok to="myotherjob"/> + <error to="errorcleanup"/> + </action> + ... +</workflow-app> +``` + +### Spark Action Logging + +Spark action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Spark. + +From Oozie web-console, from the Spark action pop up using the 'Console URL' link, it is possible +to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console. + +### Spark on YARN + +To ensure that your Spark job shows up in the Spark History Server, make sure to specify these three Spark configuration properties +either in `spark-opts` with `--conf` or from `oozie.service.SparkConfigurationService.spark.configurations` in oozie-site.xml. + +1. spark.yarn.historyServer.address=SPH-HOST:18088 + +2. spark.eventLog.dir=`hdfs://NN:8020/user/spark/applicationHistory` + +3. spark.eventLog.enabled=true + +### PySpark with Spark Action + +To submit PySpark scripts with Spark Action, pyspark dependencies must be available in sharelib or in workflow's lib/ directory. +For more information, please refer to [installation document.](AG_Install.html#Oozie_Share_Lib) + +**Example:** + + +``` +<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> + .... + <action name="myfirstpysparkjob"> + <spark xmlns="uri:oozie:spark-action:1.0"> + <resource-manager>foo:8032</resource-manager> + <name-node>bar:8020</name-node> + <prepare> + <delete path="${jobOutput}"/> + </prepare> + <configuration> + <property> + <name>mapred.compress.map.output</name> + <value>true</value> + </property> + </configuration> + <master>yarn-cluster</master> + <name>Spark Example</name> + <jar>pi.py</jar> + <spark-opts>--executor-memory 20G --num-executors 50 + --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts> + <arg>100</arg> + </spark> + <ok to="myotherjob"/> + <error to="errorcleanup"/> + </action> + ... +</workflow-app> +``` + +The `jar` element indicates python file. Refer to the file by it's localized name, because only local files are allowed +in PySpark. The py file should be in the lib/ folder next to the workflow.xml or added using the `file` element so that +it's localized to the working directory with just its name. + +### Using Symlink in \<jar\> + +A symlink must be specified using [file](WorkflowFunctionalSpec.html#a3.2.2.1_Adding_Files_and_Archives_for_the_Job) element. Then, you can use +the symlink name in `jar` element. + +**Example:** + +Specifying relative path for symlink: + +Make sure that the file is within the application directory i.e. `oozie.wf.application.path` . + +``` + <spark xmlns="uri:oozie:spark-action:1.0"> + ... + <jar>py-spark-example-symlink.py</jar> + ... + ... + <file>py-spark.py#py-spark-example-symlink.py</file> + ... + </spark> +``` + +Specifying full path for symlink: + +``` + <spark xmlns="uri:oozie:spark-action:1.0"> + ... + <jar>spark-example-symlink.jar</jar> + ... + ... + <file>hdfs://localhost:8020/user/testjars/all-oozie-examples.jar#spark-example-symlink.jar</file> + ... + </spark> +``` + + + +## Appendix, Spark XML-Schema + +### AE.A Appendix A, Spark XML-Schema + +#### Spark Action Schema Version 1.0 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:spark="uri:oozie:spark-action:1.0" elementFormDefault="qualified" + targetNamespace="uri:oozie:spark-action:1.0"> +. + <xs:include schemaLocation="oozie-common-1.0.xsd"/> +. + <xs:element name="spark" type="spark:ACTION"/> +. + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:choice> + <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/> + </xs:choice> + <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="launcher" type="spark:LAUNCHER" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> +. +</xs:schema> +``` + +#### Spark Action Schema Version 0.2 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:spark="uri:oozie:spark-action:0.2" elementFormDefault="qualified" + targetNamespace="uri:oozie:spark-action:0.2"> + + <xs:element name="spark" type="spark:ACTION"/> + + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="CONFIGURATION"> + <xs:sequence> + <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> + </xs:sequence> + </xs:complexType> + </xs:element> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="PREPARE"> + <xs:sequence> + <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="DELETE"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + + <xs:complexType name="MKDIR"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + +</xs:schema> +``` + +#### Spark Action Schema Version 0.1 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:spark="uri:oozie:spark-action:0.1" elementFormDefault="qualified" + targetNamespace="uri:oozie:spark-action:0.1"> + + <xs:element name="spark" type="spark:ACTION"/> + + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="CONFIGURATION"> + <xs:sequence> + <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> + </xs:sequence> + </xs:complexType> + </xs:element> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="PREPARE"> + <xs:sequence> + <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="DELETE"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + + <xs:complexType name="MKDIR"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + +</xs:schema> +``` +[::Go back to Oozie Documentation Index::](index.html) + + + + +
http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SqoopActionExtension.md ---------------------------------------------------------------------- diff --git a/docs/src/site/markdown/DG_SqoopActionExtension.md b/docs/src/site/markdown/DG_SqoopActionExtension.md new file mode 100644 index 0000000..b186c5a --- /dev/null +++ b/docs/src/site/markdown/DG_SqoopActionExtension.md @@ -0,0 +1,348 @@ + + +[::Go back to Oozie Documentation Index::](index.html) + +----- + +# Oozie Sqoop Action Extension + +<!-- MACRO{toc|fromDepth=1|toDepth=4} --> + +## Sqoop Action + +**IMPORTANT:** The Sqoop action requires Apache Hadoop 1.x or 2.x. + +The `sqoop` action runs a Sqoop job. + +The workflow job will wait until the Sqoop job completes before +continuing to the next action. + +To run the Sqoop job, you have to configure the `sqoop` action with the `resource-manager`, `name-node` and Sqoop `command` +or `arg` elements as well as configuration. + +A `sqoop` action can be configured to create or delete HDFS directories +before starting the Sqoop job. + +Sqoop configuration can be specified with a file, using the `job-xml` +element, and inline, using the `configuration` elements. + +Oozie EL expressions can be used in the inline configuration. Property +values specified in the `configuration` element override values specified +in the `job-xml` file. + +Note that YARN `yarn.resourcemanager.address` / `resource-manager` and HDFS `fs.default.name` / `name-node` properties must not +be present in the inline configuration. + +As with Hadoop `map-reduce` jobs, it is possible to add files and +archives in order to make them available to the Sqoop job. Refer to the +[WorkflowFunctionalSpec#FilesArchives][Adding Files and Archives for the Job] +section for more information about this feature. + +**Syntax:** + + +``` +<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="[NODE-NAME]"> + <sqoop xmlns="uri:oozie:sqoop-action:1.0"> + <resource-manager>[RESOURCE-MANAGER]</resource-manager> + <name-node>[NAME-NODE]</name-node> + <prepare> + <delete path="[PATH]"/> + ... + <mkdir path="[PATH]"/> + ... + </prepare> + <configuration> + <property> + <name>[PROPERTY-NAME]</name> + <value>[PROPERTY-VALUE]</value> + </property> + ... + </configuration> + <command>[SQOOP-COMMAND]</command> + <arg>[SQOOP-ARGUMENT]</arg> + ... + <file>[FILE-PATH]</file> + ... + <archive>[FILE-PATH]</archive> + ... + </sqoop> + <ok to="[NODE-NAME]"/> + <error to="[NODE-NAME]"/> + </action> + ... +</workflow-app> +``` + +The `prepare` element, if present, indicates a list of paths to delete +or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`. + +The `job-xml` element, if present, specifies a file containing configuration +for the Sqoop job. As of schema 0.3, multiple `job-xml` elements are allowed in order to +specify multiple `job.xml` files. + +The `configuration` element, if present, contains configuration +properties that are passed to the Sqoop job. + +**Sqoop command** + +The Sqoop command can be specified either using the `command` element or multiple `arg` +elements. + +When using the `command` element, Oozie will split the command on every space +into multiple arguments. + +When using the `arg` elements, Oozie will pass each argument value as an argument to Sqoop. + +The `arg` variant should be used when there are spaces within a single argument. + +Consult the Sqoop documentation for a complete list of valid Sqoop commands. + +All the above elements can be parameterized (templatized) using EL +expressions. + +**Examples:** + +Using the `command` element: + + +``` +<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="myfirsthivejob"> + <sqoop xmlns="uri:oozie:sqoop-action:1.0"> + <resource-manager>foo:8032</resource-manager> + <name-node>bar:8020</name-node> + <prepare> + <delete path="${jobOutput}"/> + </prepare> + <configuration> + <property> + <name>mapred.compress.map.output</name> + <value>true</value> + </property> + </configuration> + <command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command> + </sqoop> + <ok to="myotherjob"/> + <error to="errorcleanup"/> + </action> + ... +</workflow-app> +``` + +The same Sqoop action using `arg` elements: + + +``` +<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="myfirstsqoopjob"> + <sqoop xmlns="uri:oozie:sqoop-action:1.0"> + <resource-manager>foo:8032</resource-manager> + <name-node>bar:8020</name-node> + <prepare> + <delete path="${jobOutput}"/> + </prepare> + <configuration> + <property> + <name>mapred.compress.map.output</name> + <value>true</value> + </property> + </configuration> + <arg>import</arg> + <arg>--connect</arg> + <arg>jdbc:hsqldb:file:db.hsqldb</arg> + <arg>--table</arg> + <arg>TT</arg> + <arg>--target-dir</arg> + <arg>hdfs://localhost:8020/user/tucu/foo</arg> + <arg>-m</arg> + <arg>1</arg> + </sqoop> + <ok to="myotherjob"/> + <error to="errorcleanup"/> + </action> + ... +</workflow-app> +``` + +NOTE: The `arg` elements syntax, while more verbose, allows to have spaces in a single argument, something useful when +using free from queries. + +### Sqoop Action Counters + +The counters of the map-reduce job run by the Sqoop action are available to be used in the workflow via the +[hadoop:counters() EL function](WorkflowFunctionalSpec.html#HadoopCountersEL). + +If the Sqoop action run an import all command, the `hadoop:counters()` EL will return the aggregated counters +of all map-reduce jobs run by the Sqoop import all command. + +### Sqoop Action Logging + +Sqoop action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Sqoop. + +From Oozie web-console, from the Sqoop action pop up using the 'Console URL' link, it is possible +to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console. + +The logging level of the Sqoop action can set in the Sqoop action configuration using the +property `oozie.sqoop.log.level`. The default value is `INFO`. + +## Appendix, Sqoop XML-Schema + +### AE.A Appendix A, Sqoop XML-Schema + +#### Sqoop Action Schema Version 1.0 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:sqoop="uri:oozie:sqoop-action:1.0" + elementFormDefault="qualified" + targetNamespace="uri:oozie:sqoop-action:1.0"> +. + <xs:include schemaLocation="oozie-common-1.0.xsd"/> +. + <xs:element name="sqoop" type="sqoop:ACTION"/> +. + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:choice> + <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/> + </xs:choice> + <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="launcher" type="sqoop:LAUNCHER" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:choice> + <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> + </xs:choice> + <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> +. +</xs:schema> +``` + +#### Sqoop Action Schema Version 0.3 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:sqoop="uri:oozie:sqoop-action:0.3" elementFormDefault="qualified" + targetNamespace="uri:oozie:sqoop-action:0.3"> + + <xs:element name="sqoop" type="sqoop:ACTION"/> + + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:choice> + <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> + </xs:choice> + <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="CONFIGURATION"> + <xs:sequence> + <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> + </xs:sequence> + </xs:complexType> + </xs:element> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="PREPARE"> + <xs:sequence> + <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:complexType name="DELETE"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + + <xs:complexType name="MKDIR"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> + +</xs:schema> +``` + +#### Sqoop Action Schema Version 0.2 + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:sqoop="uri:oozie:sqoop-action:0.2" elementFormDefault="qualified" + targetNamespace="uri:oozie:sqoop-action:0.2"> + + <xs:element name="sqoop" type="sqoop:ACTION"/> +. + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> + <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/> + <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> + <xs:choice> + <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> + </xs:choice> + <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> +. + <xs:complexType name="CONFIGURATION"> + <xs:sequence> + <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> + <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> + </xs:sequence> + </xs:complexType> + </xs:element> + </xs:sequence> + </xs:complexType> +. + <xs:complexType name="PREPARE"> + <xs:sequence> + <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> +. + <xs:complexType name="DELETE"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> +. + <xs:complexType name="MKDIR"> + <xs:attribute name="path" type="xs:string" use="required"/> + </xs:complexType> +. +</xs:schema> +``` + +[::Go back to Oozie Documentation Index::](index.html) + + http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SshActionExtension.md ---------------------------------------------------------------------- diff --git a/docs/src/site/markdown/DG_SshActionExtension.md b/docs/src/site/markdown/DG_SshActionExtension.md new file mode 100644 index 0000000..e53e1c3 --- /dev/null +++ b/docs/src/site/markdown/DG_SshActionExtension.md @@ -0,0 +1,161 @@ + + +[::Go back to Oozie Documentation Index::](index.html) + +----- + +# Oozie Ssh Action Extension + +<!-- MACRO{toc|fromDepth=1|toDepth=4} --> + +## Ssh Action + +The `ssh` action starts a shell command on a remote machine as a remote secure shell in background. The workflow job +will wait until the remote shell command completes before continuing to the next action. + +The shell command must be present in the remote machine and it must be available for execution via the command path. + +The shell command is executed in the home directory of the specified user in the remote host. + +The output (STDOUT) of the ssh job can be made available to the workflow job after the ssh job ends. This information +could be used from within decision nodes. If the output of the ssh job is made available to the workflow job the shell +command must follow the following requirements: + + * The format of the output must be a valid Java Properties file. + * The size of the output must not exceed 2KB. + +Note: Ssh Action will fail if any output is written to standard error / output upon login (e.g. .bashrc of the remote +user contains ls -a). + +Note: Ssh Action will fail if oozie fails to ssh connect to host for action status check +(e.g., the host is under heavy load, or network is bad) after a configurable number (3 by default) of retries. +The first retry will wait a configurable period of time ( 3 seconds by default) before check. +The following retries will wait 2 times of previous wait time. + +**Syntax:** + + +``` +<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="[NODE-NAME]"> + <ssh xmlns="uri:oozie:ssh-action:0.1"> + <host>[USER]@[HOST]</host> + <command>[SHELL]</command> + <args>[ARGUMENTS]</args> + ... + <capture-output/> + </ssh> + <ok to="[NODE-NAME]"/> + <error to="[NODE-NAME]"/> + </action> + ... +</workflow-app> +``` + +The `host` indicates the user and host where the shell will be executed. + +**IMPORTANT:** The `oozie.action.ssh.allow.user.at.host` property, in the `oozie-site.xml` configuration, indicates if +an alternate user than the one submitting the job can be used for the ssh invocation. By default this property is set +to `true`. + +The `command` element indicates the shell command to execute. + +The `args` element, if present, contains parameters to be passed to the shell command. If more than one `args` element +is present they are concatenated in order. When an `args` element contains a space, even when quoted, it will be considered as +separate arguments (i.e. "Hello World" becomes "Hello" and "World"). Starting with ssh schema 0.2, you can use the `arg` element +(note that this is different than the `args` element) to specify arguments that have a space in them (i.e. "Hello World" is +preserved as "Hello World"). You can use either `args` elements, `arg` elements, or neither; but not both in the same action. + +If the `capture-output` element is present, it indicates Oozie to capture output of the STDOUT of the ssh command +execution. The ssh command output must be in Java Properties file format and it must not exceed 2KB. From within the +workflow definition, the output of an ssh action node is accessible via the =String action:output(String node, +String key)= function (Refer to section '4.2.6 Action EL Functions'). + +The configuration of the `ssh` action can be parameterized (templatized) using EL expressions. + +**Example:** + + +``` +<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> + ... + <action name="myssjob"> + <ssh xmlns="uri:oozie:ssh-action:0.1"> + <host>f...@bar.com<host> + <command>uploaddata</command> + <args>jdbc:derby://bar.com:1527/myDB</args> + <args>hdfs://foobar.com:8020/usr/tucu/myData</args> + </ssh> + <ok to="myotherjob"/> + <error to="errorcleanup"/> + </action> + ... +</workflow-app> +``` + +In the above example, the `uploaddata` shell command is executed with two arguments, `jdbc:derby://foo.com:1527/myDB` +and `hdfs://foobar.com:8020/usr/tucu/myData`. + +The `uploaddata` shell must be available in the remote host and available in the command path. + +The output of the command will be ignored because the `capture-output` element is not present. + +## Appendix, Ssh XML-Schema + +### AE.A Appendix A, Ssh XML-Schema + +#### Ssh Action Schema Version 0.2 + + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:ssh="uri:oozie:ssh-action:0.2" elementFormDefault="qualified" + targetNamespace="uri:oozie:ssh-action:0.2"> +. + <xs:element name="ssh" type="ssh:ACTION"/> +. + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:choice> + <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + </xs:choice> + <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/> + </xs:sequence> + </xs:complexType> +. + <xs:complexType name="FLAG"/> +. +</xs:schema> +``` + +#### Ssh Action Schema Version 0.1 + + +``` +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + xmlns:ssh="uri:oozie:ssh-action:0.1" elementFormDefault="qualified" + targetNamespace="uri:oozie:ssh-action:0.1"> +. + <xs:element name="ssh" type="ssh:ACTION"/> +. + <xs:complexType name="ACTION"> + <xs:sequence> + <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> + <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/> + </xs:sequence> + </xs:complexType> +. + <xs:complexType name="FLAG"/> +. +</xs:schema> +``` + +[::Go back to Oozie Documentation Index::](index.html) + + http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_WorkflowReRun.md ---------------------------------------------------------------------- diff --git a/docs/src/site/markdown/DG_WorkflowReRun.md b/docs/src/site/markdown/DG_WorkflowReRun.md new file mode 100644 index 0000000..c128681 --- /dev/null +++ b/docs/src/site/markdown/DG_WorkflowReRun.md @@ -0,0 +1,42 @@ + + +[::Go back to Oozie Documentation Index::](index.html) + +# Workflow ReRrun + +<!-- MACRO{toc|fromDepth=1|toDepth=4} --> +## Configs + + * oozie.wf.application.path + * Only one of following two configurations is mandatory. Both should not be defined at the same time + * oozie.wf.rerun.skip.nodes + * oozie.wf.rerun.failnodes + * Skip nodes are comma separated list of action names. They can be any action nodes including decision node. + * The valid value of `oozie.wf.rerun.failnodes` is true or false. + * If secured hadoop version is used, the following two properties needs to be specified as well + * mapreduce.jobtracker.kerberos.principal + * dfs.namenode.kerberos.principal. + * Configurations can be passed as -D param. + +``` +$ oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=<> +``` + +## Pre-Conditions + + * Workflow with id wfId should exist. + * Workflow with id wfId should be in SUCCEEDED/KILLED/FAILED. + * If specified , nodes in the config oozie.wf.rerun.skip.nodes must be completed successfully. + +## ReRun + + * Reloads the configs. + * If no configuration is passed, existing coordinator/workflow configuration will be used. If configuration is passed then, it will be merged with existing workflow configuration. Input configuration will take the precedence. + * Currently there is no way to remove an existing configuration but only override by passing a different value in the input configuration. + * Creates a new Workflow Instance with the same wfId. + * Deletes the actions that are not skipped from the DB and copies data from old Workflow Instance to new one for skipped actions. + * Action handler will skip the nodes given in the config with the same exit transition as before. + +[::Go back to Oozie Documentation Index::](index.html) + + http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/ENG_MiniOozie.md ---------------------------------------------------------------------- diff --git a/docs/src/site/markdown/ENG_MiniOozie.md b/docs/src/site/markdown/ENG_MiniOozie.md new file mode 100644 index 0000000..e793676 --- /dev/null +++ b/docs/src/site/markdown/ENG_MiniOozie.md @@ -0,0 +1,83 @@ + + +[::Go back to Oozie Documentation Index::](index.html) + +# Running MiniOozie Tests + +<!-- MACRO{toc|fromDepth=1|toDepth=4} --> + +## System Requirements + + * Unix box (tested on Mac OS X and Linux) + * Java JDK 1.8+ + * Eclipse (tested on 3.5 and 3.6) + * [Maven 3.0.1+](http://maven.apache.org/) + +The Maven command (mvn) must be in the command path. + +## Installing Oozie Jars To Maven Cache + +Oozie source tree is at Apache SVN or Apache GIT. MiniOozie sample project is under Oozie source tree. + +The following command downloads Oozie trunk to local: + + +``` +$ svn co https://svn.apache.org/repos/asf/incubator/oozie/trunk +``` + +OR + + +``` +$ git clone git://github.com/apache/oozie.git +``` + +To run MiniOozie tests, the required jars like oozie-core, oozie-client, oozie-core-tests need to be +available in remote maven repositories or local maven repository. The local maven cache for the above +jars can be created and installed using the command: + + +``` +$ mvn clean install -DskipTests -DtestJarSimple +``` + +The following properties should be specified to install correct jars for MiniOozie: + + * -DskipTests : ignore executing Oozie unittests + * -DtestJarSimple= : build only required test classes to oozie-core-tests + +MiniOozie is a folder named 'minitest' under Oozie source tree. Two sample tests are included in the project. +The following command to execute tests under MiniOozie: + + +``` +$ cd minitest +$ mvn clean test +``` + +## Create Tests Using MiniOozie + +MiniOozie is a JUnit test class to test Oozie applications such as workflow and coordinator. The test case +needs to extend from MiniOozieTestCase and does the same as the example class 'WorkflowTest.java' to create Oozie +workflow application properties and workflow XML. The example file is under Oozie source tree: + + * `minitest/src/test/java/org/apache/oozie/test/WorkflowTest.java` + +## IDE Setup + +Eclipse and IntelliJ can use directly MiniOozie Maven project files. MiniOozie project can be imported to +Eclipse and IntelliJ as independent project. + +The test directories under MiniOozie are: + + * `minitest/src/test/java` : as test-source directory + * `minitest/src/test/resources` : as test-resource directory + + +Also asynchronous actions like FS action can be used / tested using `LocalOozie` / `OozieClient` API. +Please see `fs-decision.xml` workflow example. + +[::Go back to Oozie Documentation Index::](index.html) + +