[
https://issues.apache.org/jira/browse/OOZIE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033104#comment-14033104
]
Alejandro Abdelnur commented on OOZIE-1879:
-------------------------------------------
+1 LGTM. One minor NIT in LiteWorkflowInstance.java, {{if (numActionEndTimes >=
0) {}} will work, but strictly speaking it should be {{>}}.
> Workflow Rerun causes error depending on the order of forked nodes
> ------------------------------------------------------------------
>
> Key: OOZIE-1879
> URL: https://issues.apache.org/jira/browse/OOZIE-1879
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Priority: Blocker
> Attachments: OOZIE-1879.patch
>
>
> Suppose you have a workflow like this:
> {noformat}
> start --> fork
> fork --> shell1, shell2
> shell1 --> join
> shell2 --> join
> join --> shell3
> shell3 --> end
> {noformat}
> And all but shell3 are successful.
> Assuming you fix the problem with shell3, if you do a rerun, the following
> two outcomes can happen:
> # If shell1 finished before shell2, then the rerun succeeds
> # If shell2 finished before shell1, then the rerun fails
> The error in the second outcome is simply this log message:
> {noformat}
> 2014-05-29 17:17:03,735 ERROR
> org.apache.oozie.workflow.lite.LiteWorkflowInstance:
> SERVER[cdh5-1.cloudera.local] USER[pdvorak] GROUP[-] TOKEN[]
> APP[test-rerun-wf] JOB[0000004-140521220856264-oozie-oozi-W]
> ACTION[0000004-140521220856264-oozie-oozi-W@join] invalid execution path
> [/shell1/]
> {noformat}
> After a bunch of digging, I discovered that during a rerun with the above
> workflow or similar workflows, LiteWorkflowInstance#signal gets called for
> each action in the fork node in the order that they are listed in the fork
> node's XML; however, during the original run, LiteWorkflowInstance#signal
> gets called for each action in the order that they complete (i.e. endTime).
> When these don't match, you get the above error. The general fix for this is
> therefore to ensure that during a rerun, LiteWorkflowInstance#signal gets
> called for each action in the fork node in the order that they originally ran
> in. And if you think about it, that is more correct than the current
> behavior anyway.
--
This message was sent by Atlassian JIRA
(v6.2#6252)