http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/DG_SparkActionExtension.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/DG_SparkActionExtension.twiki b/docs/src/site/twiki/DG_SparkActionExtension.twiki deleted file mode 100644 index 5a56cca..0000000 --- a/docs/src/site/twiki/DG_SparkActionExtension.twiki +++ /dev/null @@ -1,436 +0,0 @@ - - -[::Go back to Oozie Documentation Index::](index.html) - ------ - -# Oozie Spark Action Extension - -<!-- MACRO{toc|fromDepth=1|toDepth=4} --> - -## Spark Action - -The `spark` action runs a Spark job. - -The workflow job will wait until the Spark job completes before -continuing to the next action. - -To run the Spark job, you have to configure the `spark` action with -the `resource-manager`, `name-node`, Spark `master` elements as -well as the necessary elements, arguments and configuration. - -Spark options can be specified in an element called `spark-opts`. - -A `spark` action can be configured to create or delete HDFS directories -before starting the Spark job. - -Oozie EL expressions can be used in the inline configuration. Property -values specified in the `configuration` element override values specified -in the `job-xml` file. - -**Syntax:** - - -``` -<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="[NODE-NAME]"> - <spark xmlns="uri:oozie:spark-action:1.0"> - <resource-manager>[RESOURCE-MANAGER]</resource-manager> - <name-node>[NAME-NODE]</name-node> - <prepare> - <delete path="[PATH]"/> - ... - <mkdir path="[PATH]"/> - ... - </prepare> - <job-xml>[SPARK SETTINGS FILE]</job-xml> - <configuration> - <property> - <name>[PROPERTY-NAME]</name> - <value>[PROPERTY-VALUE]</value> - </property> - ... - </configuration> - <master>[SPARK MASTER URL]</master> - <mode>[SPARK MODE]</mode> - <name>[SPARK JOB NAME]</name> - <class>[SPARK MAIN CLASS]</class> - <jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar> - <spark-opts>[SPARK-OPTIONS]</spark-opts> - <arg>[ARG-VALUE]</arg> - ... - <arg>[ARG-VALUE]</arg> - ... - </spark> - <ok to="[NODE-NAME]"/> - <error to="[NODE-NAME]"/> - </action> - ... -</workflow-app> -``` - -The `prepare` element, if present, indicates a list of paths to delete -or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`. - -The `job-xml` element, if present, specifies a file containing configuration -for the Spark job. Multiple `job-xml` elements are allowed in order to -specify multiple `job.xml` files. - -The `configuration` element, if present, contains configuration -properties that are passed to the Spark job. - -The `master` element indicates the url of the Spark Master. Ex: `spark://host:port`, `mesos://host:port`, yarn-cluster, yarn-client, -or local. - -The `mode` element if present indicates the mode of spark, where to run spark driver program. Ex: client,cluster. This is typically -not required because you can specify it as part of `master` (i.e. master`yarn, mode`client is equivalent to master=yarn-client). -A local `master` always runs in client mode. - -Depending on the `master` (and `mode`) entered, the Spark job will run differently as follows: - - * local mode: everything runs here in the Launcher Job. - * yarn-client mode: the driver runs here in the Launcher Job and the executor in Yarn. - * yarn-cluster mode: the driver and executor run in Yarn. - -The `name` element indicates the name of the spark application. - -The `class` element if present, indicates the spark's application main class. - -The `jar` element indicates a comma separated list of jars or python files. - -The `spark-opts` element, if present, contains a list of Spark options that can be passed to Spark. Spark configuration -options can be passed by specifying '--conf key=value' or other Spark CLI options. -Values containing whitespaces can be enclosed by double quotes. - -Some examples of the `spark-opts` element: - - * '--conf key=value' - * '--conf key1=value1 value2' - * '--conf key1="value1 value2"' - * '--conf key1=value1 key2="value2 value3"' - * '--conf key=value --verbose --properties-file user.properties' - -There are several ways to define properties that will be passed to Spark. They are processed in the following order: - - * propagated from `oozie.service.SparkConfigurationService.spark.configurations` - * read from a localized `spark-defaults.conf` file - * read from a file defined in `spark-opts` via the `--properties-file` - * properties defined in `spark-opts` element - -(The latter takes precedence over the former.) -The server propagated properties, the `spark-defaults.conf` and the user-defined properties file are merged together into a -single properties file as Spark handles only one file in its `--properties-file` option. - -The `arg` element if present, contains arguments that can be passed to spark application. - -In case some property values are present both in `spark-defaults.conf` and as property key/value pairs generated by Oozie, the user -configured values from `spark-defaults.conf` are prepended to the ones generated by Oozie, as part of the Spark arguments list. - -Following properties to prepend to Spark arguments: - - * `spark.executor.extraClassPath` - * `spark.driver.extraClassPath` - * `spark.executor.extraJavaOptions` - * `spark.driver.extraJavaOptions` - -All the above elements can be parameterized (templatized) using EL -expressions. - -**Example:** - - -``` -<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="myfirstsparkjob"> - <spark xmlns="uri:oozie:spark-action:1.0"> - <resource-manager>foo:8032</resource-manager> - <name-node>bar:8020</name-node> - <prepare> - <delete path="${jobOutput}"/> - </prepare> - <configuration> - <property> - <name>mapred.compress.map.output</name> - <value>true</value> - </property> - </configuration> - <master>local[*]</master> - <mode>client</mode> - <name>Spark Example</name> - <class>org.apache.spark.examples.mllib.JavaALS</class> - <jar>/lib/spark-examples_2.10-1.1.0.jar</jar> - <spark-opts>--executor-memory 20G --num-executors 50 - --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts> - <arg>inputpath=hdfs://localhost/input/file.txt</arg> - <arg>value=2</arg> - </spark> - <ok to="myotherjob"/> - <error to="errorcleanup"/> - </action> - ... -</workflow-app> -``` - -### Spark Action Logging - -Spark action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Spark. - -From Oozie web-console, from the Spark action pop up using the 'Console URL' link, it is possible -to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console. - -### Spark on YARN - -To ensure that your Spark job shows up in the Spark History Server, make sure to specify these three Spark configuration properties -either in `spark-opts` with `--conf` or from `oozie.service.SparkConfigurationService.spark.configurations` in oozie-site.xml. - -1. spark.yarn.historyServer.address=SPH-HOST:18088 - -2. spark.eventLog.dir=`hdfs://NN:8020/user/spark/applicationHistory` - -3. spark.eventLog.enabled=true - -### PySpark with Spark Action - -To submit PySpark scripts with Spark Action, pyspark dependencies must be available in sharelib or in workflow's lib/ directory. -For more information, please refer to [installation document.](AG_Install.html#Oozie_Share_Lib) - -**Example:** - - -``` -<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> - .... - <action name="myfirstpysparkjob"> - <spark xmlns="uri:oozie:spark-action:1.0"> - <resource-manager>foo:8032</resource-manager> - <name-node>bar:8020</name-node> - <prepare> - <delete path="${jobOutput}"/> - </prepare> - <configuration> - <property> - <name>mapred.compress.map.output</name> - <value>true</value> - </property> - </configuration> - <master>yarn-cluster</master> - <name>Spark Example</name> - <jar>pi.py</jar> - <spark-opts>--executor-memory 20G --num-executors 50 - --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts> - <arg>100</arg> - </spark> - <ok to="myotherjob"/> - <error to="errorcleanup"/> - </action> - ... -</workflow-app> -``` - -The `jar` element indicates python file. Refer to the file by it's localized name, because only local files are allowed -in PySpark. The py file should be in the lib/ folder next to the workflow.xml or added using the `file` element so that -it's localized to the working directory with just its name. - -### Using Symlink in \<jar\> - -A symlink must be specified using [file](WorkflowFunctionalSpec.html#a3.2.2.1_Adding_Files_and_Archives_for_the_Job) element. Then, you can use -the symlink name in `jar` element. - -**Example:** - -Specifying relative path for symlink: - -Make sure that the file is within the application directory i.e. `oozie.wf.application.path` . - -``` - <spark xmlns="uri:oozie:spark-action:1.0"> - ... - <jar>py-spark-example-symlink.py</jar> - ... - ... - <file>py-spark.py#py-spark-example-symlink.py</file> - ... - </spark> -``` - -Specifying full path for symlink: - -``` - <spark xmlns="uri:oozie:spark-action:1.0"> - ... - <jar>spark-example-symlink.jar</jar> - ... - ... - <file>hdfs://localhost:8020/user/testjars/all-oozie-examples.jar#spark-example-symlink.jar</file> - ... - </spark> -``` - - - -## Appendix, Spark XML-Schema - -### AE.A Appendix A, Spark XML-Schema - -#### Spark Action Schema Version 1.0 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:spark="uri:oozie:spark-action:1.0" elementFormDefault="qualified" - targetNamespace="uri:oozie:spark-action:1.0"> -. - <xs:include schemaLocation="oozie-common-1.0.xsd"/> -. - <xs:element name="spark" type="spark:ACTION"/> -. - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:choice> - <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/> - </xs:choice> - <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="launcher" type="spark:LAUNCHER" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> -. -</xs:schema> -``` - -#### Spark Action Schema Version 0.2 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:spark="uri:oozie:spark-action:0.2" elementFormDefault="qualified" - targetNamespace="uri:oozie:spark-action:0.2"> - - <xs:element name="spark" type="spark:ACTION"/> - - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="CONFIGURATION"> - <xs:sequence> - <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> - </xs:sequence> - </xs:complexType> - </xs:element> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="PREPARE"> - <xs:sequence> - <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="DELETE"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - - <xs:complexType name="MKDIR"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - -</xs:schema> -``` - -#### Spark Action Schema Version 0.1 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:spark="uri:oozie:spark-action:0.1" elementFormDefault="qualified" - targetNamespace="uri:oozie:spark-action:0.1"> - - <xs:element name="spark" type="spark:ACTION"/> - - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="CONFIGURATION"> - <xs:sequence> - <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> - </xs:sequence> - </xs:complexType> - </xs:element> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="PREPARE"> - <xs:sequence> - <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="DELETE"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - - <xs:complexType name="MKDIR"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - -</xs:schema> -``` -[::Go back to Oozie Documentation Index::](index.html) - - - - -
http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/DG_SqoopActionExtension.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/DG_SqoopActionExtension.twiki b/docs/src/site/twiki/DG_SqoopActionExtension.twiki deleted file mode 100644 index b186c5a..0000000 --- a/docs/src/site/twiki/DG_SqoopActionExtension.twiki +++ /dev/null @@ -1,348 +0,0 @@ - - -[::Go back to Oozie Documentation Index::](index.html) - ------ - -# Oozie Sqoop Action Extension - -<!-- MACRO{toc|fromDepth=1|toDepth=4} --> - -## Sqoop Action - -**IMPORTANT:** The Sqoop action requires Apache Hadoop 1.x or 2.x. - -The `sqoop` action runs a Sqoop job. - -The workflow job will wait until the Sqoop job completes before -continuing to the next action. - -To run the Sqoop job, you have to configure the `sqoop` action with the `resource-manager`, `name-node` and Sqoop `command` -or `arg` elements as well as configuration. - -A `sqoop` action can be configured to create or delete HDFS directories -before starting the Sqoop job. - -Sqoop configuration can be specified with a file, using the `job-xml` -element, and inline, using the `configuration` elements. - -Oozie EL expressions can be used in the inline configuration. Property -values specified in the `configuration` element override values specified -in the `job-xml` file. - -Note that YARN `yarn.resourcemanager.address` / `resource-manager` and HDFS `fs.default.name` / `name-node` properties must not -be present in the inline configuration. - -As with Hadoop `map-reduce` jobs, it is possible to add files and -archives in order to make them available to the Sqoop job. Refer to the -[WorkflowFunctionalSpec#FilesArchives][Adding Files and Archives for the Job] -section for more information about this feature. - -**Syntax:** - - -``` -<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="[NODE-NAME]"> - <sqoop xmlns="uri:oozie:sqoop-action:1.0"> - <resource-manager>[RESOURCE-MANAGER]</resource-manager> - <name-node>[NAME-NODE]</name-node> - <prepare> - <delete path="[PATH]"/> - ... - <mkdir path="[PATH]"/> - ... - </prepare> - <configuration> - <property> - <name>[PROPERTY-NAME]</name> - <value>[PROPERTY-VALUE]</value> - </property> - ... - </configuration> - <command>[SQOOP-COMMAND]</command> - <arg>[SQOOP-ARGUMENT]</arg> - ... - <file>[FILE-PATH]</file> - ... - <archive>[FILE-PATH]</archive> - ... - </sqoop> - <ok to="[NODE-NAME]"/> - <error to="[NODE-NAME]"/> - </action> - ... -</workflow-app> -``` - -The `prepare` element, if present, indicates a list of paths to delete -or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`. - -The `job-xml` element, if present, specifies a file containing configuration -for the Sqoop job. As of schema 0.3, multiple `job-xml` elements are allowed in order to -specify multiple `job.xml` files. - -The `configuration` element, if present, contains configuration -properties that are passed to the Sqoop job. - -**Sqoop command** - -The Sqoop command can be specified either using the `command` element or multiple `arg` -elements. - -When using the `command` element, Oozie will split the command on every space -into multiple arguments. - -When using the `arg` elements, Oozie will pass each argument value as an argument to Sqoop. - -The `arg` variant should be used when there are spaces within a single argument. - -Consult the Sqoop documentation for a complete list of valid Sqoop commands. - -All the above elements can be parameterized (templatized) using EL -expressions. - -**Examples:** - -Using the `command` element: - - -``` -<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="myfirsthivejob"> - <sqoop xmlns="uri:oozie:sqoop-action:1.0"> - <resource-manager>foo:8032</resource-manager> - <name-node>bar:8020</name-node> - <prepare> - <delete path="${jobOutput}"/> - </prepare> - <configuration> - <property> - <name>mapred.compress.map.output</name> - <value>true</value> - </property> - </configuration> - <command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command> - </sqoop> - <ok to="myotherjob"/> - <error to="errorcleanup"/> - </action> - ... -</workflow-app> -``` - -The same Sqoop action using `arg` elements: - - -``` -<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="myfirstsqoopjob"> - <sqoop xmlns="uri:oozie:sqoop-action:1.0"> - <resource-manager>foo:8032</resource-manager> - <name-node>bar:8020</name-node> - <prepare> - <delete path="${jobOutput}"/> - </prepare> - <configuration> - <property> - <name>mapred.compress.map.output</name> - <value>true</value> - </property> - </configuration> - <arg>import</arg> - <arg>--connect</arg> - <arg>jdbc:hsqldb:file:db.hsqldb</arg> - <arg>--table</arg> - <arg>TT</arg> - <arg>--target-dir</arg> - <arg>hdfs://localhost:8020/user/tucu/foo</arg> - <arg>-m</arg> - <arg>1</arg> - </sqoop> - <ok to="myotherjob"/> - <error to="errorcleanup"/> - </action> - ... -</workflow-app> -``` - -NOTE: The `arg` elements syntax, while more verbose, allows to have spaces in a single argument, something useful when -using free from queries. - -### Sqoop Action Counters - -The counters of the map-reduce job run by the Sqoop action are available to be used in the workflow via the -[hadoop:counters() EL function](WorkflowFunctionalSpec.html#HadoopCountersEL). - -If the Sqoop action run an import all command, the `hadoop:counters()` EL will return the aggregated counters -of all map-reduce jobs run by the Sqoop import all command. - -### Sqoop Action Logging - -Sqoop action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Sqoop. - -From Oozie web-console, from the Sqoop action pop up using the 'Console URL' link, it is possible -to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console. - -The logging level of the Sqoop action can set in the Sqoop action configuration using the -property `oozie.sqoop.log.level`. The default value is `INFO`. - -## Appendix, Sqoop XML-Schema - -### AE.A Appendix A, Sqoop XML-Schema - -#### Sqoop Action Schema Version 1.0 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:sqoop="uri:oozie:sqoop-action:1.0" - elementFormDefault="qualified" - targetNamespace="uri:oozie:sqoop-action:1.0"> -. - <xs:include schemaLocation="oozie-common-1.0.xsd"/> -. - <xs:element name="sqoop" type="sqoop:ACTION"/> -. - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:choice> - <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/> - </xs:choice> - <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="launcher" type="sqoop:LAUNCHER" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:choice> - <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> - </xs:choice> - <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> -. -</xs:schema> -``` - -#### Sqoop Action Schema Version 0.3 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:sqoop="uri:oozie:sqoop-action:0.3" elementFormDefault="qualified" - targetNamespace="uri:oozie:sqoop-action:0.3"> - - <xs:element name="sqoop" type="sqoop:ACTION"/> - - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:choice> - <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> - </xs:choice> - <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="CONFIGURATION"> - <xs:sequence> - <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> - </xs:sequence> - </xs:complexType> - </xs:element> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="PREPARE"> - <xs:sequence> - <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:complexType name="DELETE"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - - <xs:complexType name="MKDIR"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> - -</xs:schema> -``` - -#### Sqoop Action Schema Version 0.2 - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:sqoop="uri:oozie:sqoop-action:0.2" elementFormDefault="qualified" - targetNamespace="uri:oozie:sqoop-action:0.2"> - - <xs:element name="sqoop" type="sqoop:ACTION"/> -. - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> - <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/> - <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> - <xs:choice> - <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> - </xs:choice> - <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> -. - <xs:complexType name="CONFIGURATION"> - <xs:sequence> - <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> - <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> - </xs:sequence> - </xs:complexType> - </xs:element> - </xs:sequence> - </xs:complexType> -. - <xs:complexType name="PREPARE"> - <xs:sequence> - <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> -. - <xs:complexType name="DELETE"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> -. - <xs:complexType name="MKDIR"> - <xs:attribute name="path" type="xs:string" use="required"/> - </xs:complexType> -. -</xs:schema> -``` - -[::Go back to Oozie Documentation Index::](index.html) - - http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/DG_SshActionExtension.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/DG_SshActionExtension.twiki b/docs/src/site/twiki/DG_SshActionExtension.twiki deleted file mode 100644 index e53e1c3..0000000 --- a/docs/src/site/twiki/DG_SshActionExtension.twiki +++ /dev/null @@ -1,161 +0,0 @@ - - -[::Go back to Oozie Documentation Index::](index.html) - ------ - -# Oozie Ssh Action Extension - -<!-- MACRO{toc|fromDepth=1|toDepth=4} --> - -## Ssh Action - -The `ssh` action starts a shell command on a remote machine as a remote secure shell in background. The workflow job -will wait until the remote shell command completes before continuing to the next action. - -The shell command must be present in the remote machine and it must be available for execution via the command path. - -The shell command is executed in the home directory of the specified user in the remote host. - -The output (STDOUT) of the ssh job can be made available to the workflow job after the ssh job ends. This information -could be used from within decision nodes. If the output of the ssh job is made available to the workflow job the shell -command must follow the following requirements: - - * The format of the output must be a valid Java Properties file. - * The size of the output must not exceed 2KB. - -Note: Ssh Action will fail if any output is written to standard error / output upon login (e.g. .bashrc of the remote -user contains ls -a). - -Note: Ssh Action will fail if oozie fails to ssh connect to host for action status check -(e.g., the host is under heavy load, or network is bad) after a configurable number (3 by default) of retries. -The first retry will wait a configurable period of time ( 3 seconds by default) before check. -The following retries will wait 2 times of previous wait time. - -**Syntax:** - - -``` -<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="[NODE-NAME]"> - <ssh xmlns="uri:oozie:ssh-action:0.1"> - <host>[USER]@[HOST]</host> - <command>[SHELL]</command> - <args>[ARGUMENTS]</args> - ... - <capture-output/> - </ssh> - <ok to="[NODE-NAME]"/> - <error to="[NODE-NAME]"/> - </action> - ... -</workflow-app> -``` - -The `host` indicates the user and host where the shell will be executed. - -**IMPORTANT:** The `oozie.action.ssh.allow.user.at.host` property, in the `oozie-site.xml` configuration, indicates if -an alternate user than the one submitting the job can be used for the ssh invocation. By default this property is set -to `true`. - -The `command` element indicates the shell command to execute. - -The `args` element, if present, contains parameters to be passed to the shell command. If more than one `args` element -is present they are concatenated in order. When an `args` element contains a space, even when quoted, it will be considered as -separate arguments (i.e. "Hello World" becomes "Hello" and "World"). Starting with ssh schema 0.2, you can use the `arg` element -(note that this is different than the `args` element) to specify arguments that have a space in them (i.e. "Hello World" is -preserved as "Hello World"). You can use either `args` elements, `arg` elements, or neither; but not both in the same action. - -If the `capture-output` element is present, it indicates Oozie to capture output of the STDOUT of the ssh command -execution. The ssh command output must be in Java Properties file format and it must not exceed 2KB. From within the -workflow definition, the output of an ssh action node is accessible via the =String action:output(String node, -String key)= function (Refer to section '4.2.6 Action EL Functions'). - -The configuration of the `ssh` action can be parameterized (templatized) using EL expressions. - -**Example:** - - -``` -<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0"> - ... - <action name="myssjob"> - <ssh xmlns="uri:oozie:ssh-action:0.1"> - <host>f...@bar.com<host> - <command>uploaddata</command> - <args>jdbc:derby://bar.com:1527/myDB</args> - <args>hdfs://foobar.com:8020/usr/tucu/myData</args> - </ssh> - <ok to="myotherjob"/> - <error to="errorcleanup"/> - </action> - ... -</workflow-app> -``` - -In the above example, the `uploaddata` shell command is executed with two arguments, `jdbc:derby://foo.com:1527/myDB` -and `hdfs://foobar.com:8020/usr/tucu/myData`. - -The `uploaddata` shell must be available in the remote host and available in the command path. - -The output of the command will be ignored because the `capture-output` element is not present. - -## Appendix, Ssh XML-Schema - -### AE.A Appendix A, Ssh XML-Schema - -#### Ssh Action Schema Version 0.2 - - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:ssh="uri:oozie:ssh-action:0.2" elementFormDefault="qualified" - targetNamespace="uri:oozie:ssh-action:0.2"> -. - <xs:element name="ssh" type="ssh:ACTION"/> -. - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:choice> - <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - </xs:choice> - <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/> - </xs:sequence> - </xs:complexType> -. - <xs:complexType name="FLAG"/> -. -</xs:schema> -``` - -#### Ssh Action Schema Version 0.1 - - -``` -<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:ssh="uri:oozie:ssh-action:0.1" elementFormDefault="qualified" - targetNamespace="uri:oozie:ssh-action:0.1"> -. - <xs:element name="ssh" type="ssh:ACTION"/> -. - <xs:complexType name="ACTION"> - <xs:sequence> - <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> - <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/> - </xs:sequence> - </xs:complexType> -. - <xs:complexType name="FLAG"/> -. -</xs:schema> -``` - -[::Go back to Oozie Documentation Index::](index.html) - - http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/DG_WorkflowReRun.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/DG_WorkflowReRun.twiki b/docs/src/site/twiki/DG_WorkflowReRun.twiki deleted file mode 100644 index c128681..0000000 --- a/docs/src/site/twiki/DG_WorkflowReRun.twiki +++ /dev/null @@ -1,42 +0,0 @@ - - -[::Go back to Oozie Documentation Index::](index.html) - -# Workflow ReRrun - -<!-- MACRO{toc|fromDepth=1|toDepth=4} --> -## Configs - - * oozie.wf.application.path - * Only one of following two configurations is mandatory. Both should not be defined at the same time - * oozie.wf.rerun.skip.nodes - * oozie.wf.rerun.failnodes - * Skip nodes are comma separated list of action names. They can be any action nodes including decision node. - * The valid value of `oozie.wf.rerun.failnodes` is true or false. - * If secured hadoop version is used, the following two properties needs to be specified as well - * mapreduce.jobtracker.kerberos.principal - * dfs.namenode.kerberos.principal. - * Configurations can be passed as -D param. - -``` -$ oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=<> -``` - -## Pre-Conditions - - * Workflow with id wfId should exist. - * Workflow with id wfId should be in SUCCEEDED/KILLED/FAILED. - * If specified , nodes in the config oozie.wf.rerun.skip.nodes must be completed successfully. - -## ReRun - - * Reloads the configs. - * If no configuration is passed, existing coordinator/workflow configuration will be used. If configuration is passed then, it will be merged with existing workflow configuration. Input configuration will take the precedence. - * Currently there is no way to remove an existing configuration but only override by passing a different value in the input configuration. - * Creates a new Workflow Instance with the same wfId. - * Deletes the actions that are not skipped from the DB and copies data from old Workflow Instance to new one for skipped actions. - * Action handler will skip the nodes given in the config with the same exit transition as before. - -[::Go back to Oozie Documentation Index::](index.html) - - http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/ENG_MiniOozie.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/ENG_MiniOozie.twiki b/docs/src/site/twiki/ENG_MiniOozie.twiki deleted file mode 100644 index e793676..0000000 --- a/docs/src/site/twiki/ENG_MiniOozie.twiki +++ /dev/null @@ -1,83 +0,0 @@ - - -[::Go back to Oozie Documentation Index::](index.html) - -# Running MiniOozie Tests - -<!-- MACRO{toc|fromDepth=1|toDepth=4} --> - -## System Requirements - - * Unix box (tested on Mac OS X and Linux) - * Java JDK 1.8+ - * Eclipse (tested on 3.5 and 3.6) - * [Maven 3.0.1+](http://maven.apache.org/) - -The Maven command (mvn) must be in the command path. - -## Installing Oozie Jars To Maven Cache - -Oozie source tree is at Apache SVN or Apache GIT. MiniOozie sample project is under Oozie source tree. - -The following command downloads Oozie trunk to local: - - -``` -$ svn co https://svn.apache.org/repos/asf/incubator/oozie/trunk -``` - -OR - - -``` -$ git clone git://github.com/apache/oozie.git -``` - -To run MiniOozie tests, the required jars like oozie-core, oozie-client, oozie-core-tests need to be -available in remote maven repositories or local maven repository. The local maven cache for the above -jars can be created and installed using the command: - - -``` -$ mvn clean install -DskipTests -DtestJarSimple -``` - -The following properties should be specified to install correct jars for MiniOozie: - - * -DskipTests : ignore executing Oozie unittests - * -DtestJarSimple= : build only required test classes to oozie-core-tests - -MiniOozie is a folder named 'minitest' under Oozie source tree. Two sample tests are included in the project. -The following command to execute tests under MiniOozie: - - -``` -$ cd minitest -$ mvn clean test -``` - -## Create Tests Using MiniOozie - -MiniOozie is a JUnit test class to test Oozie applications such as workflow and coordinator. The test case -needs to extend from MiniOozieTestCase and does the same as the example class 'WorkflowTest.java' to create Oozie -workflow application properties and workflow XML. The example file is under Oozie source tree: - - * `minitest/src/test/java/org/apache/oozie/test/WorkflowTest.java` - -## IDE Setup - -Eclipse and IntelliJ can use directly MiniOozie Maven project files. MiniOozie project can be imported to -Eclipse and IntelliJ as independent project. - -The test directories under MiniOozie are: - - * `minitest/src/test/java` : as test-source directory - * `minitest/src/test/resources` : as test-resource directory - - -Also asynchronous actions like FS action can be used / tested using `LocalOozie` / `OozieClient` API. -Please see `fs-decision.xml` workflow example. - -[::Go back to Oozie Documentation Index::](index.html) - -