[
https://issues.apache.org/jira/browse/OOZIE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323072#comment-15323072
]
Purshotam Shah commented on OOZIE-2445:
---------------------------------------
It's just a doc change, verified patch locally.
{code}
********Test tabs.....
---------------
********New Test tabs.....
---------------
*******Test trailing spaces.....
0
---------------
*******Test lines greater than 132
+ * COMBINE : With combine, instances of A and B can be interleaved to get
the final "combined" set of total instances. All datasets in combine should
have the same range defined with the current EL function. Combine does not
support latest and future EL functions. Combine cannot also be nested.
+ * *%BLUE% WAIT (in minutes): %ENDCOLOR%* If all dependencies are not met,
and MIN dependencies are met, then Oozie will keep on waiting for more
instances till wait time elapses or all dependent data are available.
+The conditional logic can be specified using the <input-logic> tag in the
coordinator.xml using the
[[CoordinatorFunctionalSpec#Oozie_Coordinator_Schema_0.5][Oozie Coordinator
Schema 0.5]] and above. If not specified, the default behavior of "AND" of all
defined input dependencies is applied.
+Order of definition of the dataset matters. Availability of inputs is checked
in that order. Only if input instances of the first dataset is not available,
then the input instances of the second dataset will be checked and so on. In
the case of AND or OR, the second dataset is picked only if the first dataset
does not meet all the input dependencies first. In the case of COMBINE, only
the input instances missing on the first dataset are checked for availability
on the other datasets in order and then included.
+coord:dataIn() function can be used to get the comma separated list of
evaluated hdfs paths given the name of the conditional operator.
+With above expression one can specify the dataset as AorB. Action will start
running as soon dataset A or B is available. Dataset "A" has higher precedence
over "B" because it is defined first. Oozie will first check for availability
of dataset A and only if A is not available, availability of dataset B will be
checked.
+
<uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
+
<uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs-2/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
+After the mininum two dependencies are available, processing will wait for
additional 10 minutes to include any dependencies that become available during
that period.
+MIN and WAIT can be used at parent level, which will get propagated to child
node. Above expression is equivalent to dataset A with min = 2 and wait = 10
minutes and dataset B with min = 2 and wait = 10 minutes.
+
<uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
+
<uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs-2/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
+
<uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/output-data/inputLogic/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
13
{code}
Test lines greater than 132 are from twiki.
> Doc for - Specifying coordinator input datasets in more logical ways
> (OOZIE-1976)
> ----------------------------------------------------------------------------------
>
> Key: OOZIE-2445
> URL: https://issues.apache.org/jira/browse/OOZIE-2445
> Project: Oozie
> Issue Type: Bug
> Reporter: Purshotam Shah
> Assignee: Purshotam Shah
> Attachments: CoordinatorFunctionalSpec.html, OOZIE-2445-V2.patch,
> OOZIE-2445-V3.patch, OOZIE-2445-V4.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)