Robert Kanter created OOZIE-2018:
------------------------------------
Summary: Coordinator materialization problems with cron syntax
Key: OOZIE-2018
URL: https://issues.apache.org/jira/browse/OOZIE-2018
Project: Oozie
Issue Type: Bug
Components: coordinator
Affects Versions: 4.0.0, trunk
Reporter: Robert Kanter
Suppose you submit the following coordinator job:
{code:xml}
<coordinator-app name="DailySleep"
frequency="*/2 * * * *"
start="2013-06-01T00:00Z" end="2013-06-05T00:00Z"
timezone="America/Los_Angeles"
xmlns="uri:oozie:coordinator:0.2"
>
<controls>
<timeout>-1</timeout>
<concurrency>1</concurrency>
<execution>FIFO</execution>
<throttle>2</throttle>
</controls>
<datasets>
<dataset name="sleep_time" frequency="${coord:days(1)}"
initial-instance="2012-05-31T00:00Z"
timezone="America/Los_Angeles">
<uri-template>${DAY}</uri-template>
<done-flag></done-flag>
</dataset>
</datasets>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
<configuration>
<property>
<name>REDUCER_SLEEP_TIME</name>
<value>120000</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
{code}
Where {{$\{wf_application_path}}} points to a workflow that simply runs a sleep
MR job for 2 mins.
Notice that the above coordinator job is set to run with a frequency of {{*/2 *
* * *}}, which means every 2 minutes, and the throttle is 2.
When you run this job, you’ll see a few anomalies:
# Other than the first action, each action is materialized twice. The action
numbering works fine, but you’ll see two actions for each Nominal Time. You
can see this in the job info below.
# You can’t see this in the job info below, but while it’s running, there are
actually 3 jobs READY at the same time, when there should be only 2 (because
throttle was set to 2)
# OOZIE-1680 added an oozie-site config property
{{oozie.service.coord.check.maximum.frequency=true}} which is supposed to block
jobs with frequencies faster than 5 minutes; it didn’t stop this coordinator
Points 1 and 2 above are likely the same problem. Point 3 is somewhat trivial.
Here’s the job info (I killed the job before it finished, and I cut out
non-relevent info to make it easier to read):
{noformat}
---------------------------------------------------------------------------------------------------------------------------------------
ID External ID
Created Nominal Time
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@1 0000006-140922161548481-oozie-oozi-W
2014-09-22 23:34:38 GMT 2013-06-01 00:00:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@2 0000007-140922161548481-oozie-oozi-W
2014-09-22 23:34:38 GMT 2013-06-01 00:02:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@3 0000008-140922161548481-oozie-oozi-W
2014-09-22 23:36:11 GMT 2013-06-01 00:02:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@4 0000009-140922161548481-oozie-oozi-W
2014-09-22 23:36:11 GMT 2013-06-01 00:04:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@5 0000005-140922161548481-oozie-oozi-C
2014-09-22 23:41:11 GMT 2013-06-01 00:04:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
0000005-140922161548481-oozie-oozi-C@6 0000005-140922161548481-oozie-oozi-C
2014-09-22 23:41:11 GMT 2013-06-01 00:06:00 GMT
---------------------------------------------------------------------------------------------------------------------------------------
{noformat}
I tried the same coordinator job, but used the old frequency syntax
({{$\{coord:minutes(2)}}}, and even though we don’t recommend a 2 min
frequency, it actually worked correctly (once I set
{{oozie.service.coord.check.maximum.frequency=false}} of course). So this
appears to be a problem with the cron syntax. If ({{$\{coord:minutes(2)}}}
didn’t work either, then I’d say it’s just once of the quirks of too high a
frequency, but that’s not the case here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)