[
https://issues.apache.org/jira/browse/OOZIE-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999124#comment-15999124
]
Satish Subhashrao Saley commented on OOZIE-2873:
------------------------------------------------
The idea of running dryrun on the coordinator is not sufficient to catch
invalid el functions.
1. At the job submit time, there might be some dependencies that aren't
available. This will skip the
{{CoordActionInputCheckXCommand.resolveCoordConfiguration}} call.
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L635-L644
Invalid el functions will be missed and problem persists.
The newly added unit test does not contain any dependencies, so it behaved as
expected.
2. If we call {{CoordActionInputCheckXCommand.resolveCoordConfiguration}}
explicitly every time and if actionInputLogic is present, this will throw
errors as it will be executing phase-3 el function.
{code}
Caused by: java.lang.ClassCastException:
org.apache.oozie.coord.input.dependency.CoordOldInputDependency cannot be cast
to org.apache.oozie.coord.input.dependency.AbstractCoordInputDependency
at
org.apache.oozie.coord.input.logic.CoordInputLogicEvaluatorPhaseThree.<init>(CoordInputLogicEvaluatorPhaseThree.java:42)
at
org.apache.oozie.coord.input.logic.CoordInputLogicEvaluatorUtil.getInputDependencies(CoordInputLogicEvaluatorUtil.java:134)
at
org.apache.oozie.coord.CoordELFunctions.ph3_coord_dataIn(CoordELFunctions.java:549){code}
3. I poked around el functions, but they are tied to the phases for ex. we
cannot evaluate {{coord:formatTime(coord:nominalTime(), 'DAY')}} in first
phase because we don't have value of {{coord:nominalTime()}} until phase 2.
We should revert this patch.
[~puru] Could you please take a look at this comment? Waiting for +1 to revert
the patch.
> Check El Functions before submitting the coordinator
> ----------------------------------------------------
>
> Key: OOZIE-2873
> URL: https://issues.apache.org/jira/browse/OOZIE-2873
> Project: Oozie
> Issue Type: Bug
> Reporter: Satish Subhashrao Saley
> Assignee: Satish Subhashrao Saley
> Priority: Minor
> Attachments: OOZIE-2873-1.patch
>
>
> Oozie doesn't check for el functions while submitting the coordinator job.
> Later on the coordinator action(s) can remain in WAITING state if there user
> has messed up the el functions.
> For Example.
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <coordinator-app xmlns="uri:oozie:coordinator:0.4" name="my_coord"
> frequency="${coord:hours(1)}" start="${startTime}" end="${endTime}"
> timezone="UTC">
> <controls>
> <concurrency>1</concurrency>
> <execution>FIFO</execution>
> </controls>
> <datasets>
> <dataset name="my_dataset" frequency="${coord:hours(1)}"
> initial-instance="${initInstanceTime}" timezone="UTC">
>
> <uri-template>hcat://${HCAT_SERVER}/${HCAT_DB_NAME}/${TABLE_NAME}/dt=${YEAR}${MONTH}${DAY};hr=${HOUR}</uri-template>
> </dataset>
> </datasets>
> <input-events>
> <data-in name="my_dataset_name" dataset="my_dataset">
> <instance>${coord:current(0)}</instance>
> </data-in>
>
> </input-events>
> <action>
> <workflow>
> <app-path>${oozieAppWorkflowPath}/my_workflow.xml</app-path>
> <configuration>
> <property>
> <name>yyyymmdd</name>
> <value>${coord:formatTime(coord:nominalTime(),
> 'DAY')}</value>
> </property>
> <property>
> <name>hh</name>
>
> <value>${coord:formatTime(coord:nominalTime(),'HH')}</value>
> </property>
> </configuration>
> </workflow>
> </action>
> </coordinator-app>
> {code}
> After Oozie finds out the dependency.
> {code}
> 2017-04-25 16:51:53,503 DEBUG DependencyChecker:526 [pool-11-thread-66] -
> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0100010100101-saley-C]
> ACTION[0100010100101-saley-C@1] Dependency
> [hcat://localhost:9098/my_database/my_table/dt=20170411;hr=02] is available
> {code}
> The issue is with el function
> {code}
> 2017-04-25 16:51:53,506 ERROR CoordPushDependencyCheckXCommand:517
> [pool-11-thread-66] - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[0100010100101-saley-C] ACTION[0100010100101-saley-C@1] XException,
> org.apache.oozie.command.CommandException: E1021: Coord Action Input Check
> Error: E1021: Coord Action Input Check Error: Unable to evaluate
> :${coord:formatTime(coord:nominalTime(), 'DAY')}:
> <configuration>
> <property>
> <name>yyyymmdd</name>
> <value>${coord:formatTime(coord:nominalTime(),
> 'DAY')}</value>
> </property>
> {code}
> The coord action remained in WAITING state.
> Solution:
> We should error out at the time of job submission. Currently users are
> supposed to run dry run on the coordinator before actually running it. But
> everybody wants to run directly. We should run dry run by default to catch
> such errors. While working the fix, I have found some buggy test cases which
> would have been caught if we run dry run first. Fixing those cases as well.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)