Repository: oozie Updated Branches: refs/heads/master 8bb40f3fa -> e9dc8e2e5
OOZIE-2920 Document Distcp can copy files within a cluster (Artem Ervits via rkanter) Project: http://git-wip-us.apache.org/repos/asf/oozie/repo Commit: http://git-wip-us.apache.org/repos/asf/oozie/commit/e9dc8e2e Tree: http://git-wip-us.apache.org/repos/asf/oozie/tree/e9dc8e2e Diff: http://git-wip-us.apache.org/repos/asf/oozie/diff/e9dc8e2e Branch: refs/heads/master Commit: e9dc8e2e51e8cbab9d3f6a1830a5bf4aee839a71 Parents: 8bb40f3 Author: Robert Kanter <[email protected]> Authored: Tue Jun 20 09:46:42 2017 -0700 Committer: Robert Kanter <[email protected]> Committed: Tue Jun 20 09:46:42 2017 -0700 ---------------------------------------------------------------------- .../site/twiki/DG_DistCpActionExtension.twiki | 35 +++++++++++++++++--- .../src/site/twiki/WorkflowFunctionalSpec.twiki | 7 ++-- release-log.txt | 1 + 3 files changed, 37 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/docs/src/site/twiki/DG_DistCpActionExtension.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/DG_DistCpActionExtension.twiki b/docs/src/site/twiki/DG_DistCpActionExtension.twiki index 9931a04..260cd25 100644 --- a/docs/src/site/twiki/DG_DistCpActionExtension.twiki +++ b/docs/src/site/twiki/DG_DistCpActionExtension.twiki @@ -12,7 +12,9 @@ The =DistCp= action uses Hadoop distributed copy to copy files from one cluster to another or within the same cluster. -*IMPORTANT:* The DistCp action may not work properly with all configurations (secure, insecure) in all versions of Hadoop. +*IMPORTANT:* The DistCp action may not work properly with all configurations (secure, insecure) in all versions +of Hadoop. For example, distcp between two secure clusters is tested and works well. Same is true with two insecure +clusters. In cases where a secure and insecure clusters are involved, distcp will not work. Both Hadoop clusters have to be configured with proxyuser for the Oozie process as explained [[DG_QuickStart#HadoopProxyUser][here]] on the Quick Start page. @@ -22,15 +24,15 @@ Both Hadoop clusters have to be configured with proxyuser for the Oozie process <verbatim> <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4"> ... - <action name="[NODE-NAME]"> + <action name="distcp-example"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode1}</name-node> <arg>${nameNode1}/path/to/input.txt</arg> <arg>${nameNode2}/path/to/output.txt</arg> </distcp> - <ok to="[NODE-NAME]"/> - <error to="[NODE-NAME]"/> + <ok to="end"/> + <error to="fail"/> </action> ... </workflow-app> @@ -48,6 +50,31 @@ the action: </property> </verbatim> +The =DistCp= action is also commonly used to copy files within the same cluster. Cases where copying files within +a directory to another directory or directories to target directory is supported. Example below will illustrate a +copy within a cluster, notice the source and target =nameNode= is the same and use of =*= syntax is supported to +represent only child files or directories within a source directory. For the sake of the example, =jobTracker= and +=resourceManager= are synonymous. + +*Syntax:* + +<verbatim> +<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4"> + ... + <action name="copy-example"> + <distcp xmlns="uri:oozie:distcp-action:0.2"> + <job-tracker>${resourceManager}</job-tracker> + <name-node>${nameNode}</name-node> + <arg>${nameNode}/path/to/source/*</arg> + <arg>${nameNode}/path/to/target/</arg> + </distcp> + <ok to="end"/> + <error to="fail"/> + </action> + ... +</workflow-app> +</verbatim> + ---++ Appendix, DistCp XML-Schema ---+++ AE.A Appendix A, DistCp XML-Schema http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/docs/src/site/twiki/WorkflowFunctionalSpec.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/WorkflowFunctionalSpec.twiki b/docs/src/site/twiki/WorkflowFunctionalSpec.twiki index 5a75b99..6bd3e5a 100644 --- a/docs/src/site/twiki/WorkflowFunctionalSpec.twiki +++ b/docs/src/site/twiki/WorkflowFunctionalSpec.twiki @@ -1148,9 +1148,12 @@ Path names specified in the =fs= action can be parameterized (templatized) using Path name should be specified as a absolute path. In case of =move=, =delete=, =chmod= and =chgrp= commands, a glob pattern can also be specified instead of an absolute path. For =move=, glob pattern can only be specified for source path and not the target. -Each file path must specify the file system URI, for move operations, the target must not specified the system URI. +Each file path must specify the file system URI, for move operations, the target must not specify the system URI. -IMPORTANT: All the commands within =fs= action do not happen atomically, if a =fs= action fails half way in the +*IMPORTANT:* For the purposes of copying files within a cluster it is recommended to refer to the =distcp= action +instead. Refer to [[DG_DistCpActionExtension][=distcp=]] action to copy files within a cluster. + +*IMPORTANT:* All the commands within =fs= action do not happen atomically, if a =fs= action fails half way in the commands being executed, successfully executed commands are not rolled back. The =fs= action, before executing any command must check that source paths exist and target paths don't exist (constraint regarding target relaxed for the =move= action. See below for details), thus failing before executing any command. Therefore the validity of all paths specified in one =fs= action are evaluated before any of the file operation are http://git-wip-us.apache.org/repos/asf/oozie/blob/e9dc8e2e/release-log.txt ---------------------------------------------------------------------- diff --git a/release-log.txt b/release-log.txt index cfc94e9..102c292 100644 --- a/release-log.txt +++ b/release-log.txt @@ -1,5 +1,6 @@ -- Oozie 5.0.0 release (trunk - unreleased) +OOZIE-2920 Document Distcp can copy files within a cluster (Artem Ervits via rkanter) OOZIE-2796 oozie.action.keep.action.dir not getting notice (zgengxb2005 via gezapeti) OOZIE-2769 Extend FS action to allow setrep on a file (Artem Ervits via gezapeti) OOZIE-2815 amend - Oozie not always display job log (andras.piros via gezapeti)
