[
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128269#comment-14128269
]
Shwetha G S commented on OOZIE-1536:
------------------------------------
Just outlining the issue and whole discussion:
Take a case where a coord action and its corresponding workflow is killed. So,
we have 2 options for re-running this instance. Typically in a prod
environment, there is more than one person monitoring the pipeline and we can't
make sure that they use coord action re-run/workflow re-run always. If the
coord action is re-run, oozie launches a new workflow and now there is 1 coord
action and 2 workflows for the same nominal time. If someone goes and re-runs
both the workflows, there will be 2 jobs running in parallel for the same
nominal time which generates the same data. This will result in in-consistent
data and its a nightmare to figure out the issue and fix it.
Oozie should make sure that there is single instance of the workflow for a
coord for a given nominal time. The way we can achieve it is by re-running the
old workflow with all new properties even in case of coord action re-run. Some
of the issues raised with this approach are:
1. Coord action re-run with refresh option should re-validate the data sets:
This will pick new definition from COORD_JOBS and re-materialise the instance.
Instead of launching new workflow, it can re-run the existing workflow by
overriding with new properties
2. If coord is updated, coord action re-run with refresh should pick new
definition: Same as 1 and will work
3. Case where workflow path is updated for coord: Same as 1.
ReRunXCommand(workflow re-run) deletes all entries from WF_ACTIONS(since skip
nodes will not be set for coord action re-run) and runs the workflow like a
fresh workflow.
This will solve a lot of issues:
1. Data inconsistency because of parallel workflows for the same instance
2. Concurrency handling: Workflow re-run doesn't honour concurrency. Coord
action re-run handles concurrency, but launches new workflow which causes issue
3. Decreases number of workflows in DB as coord action re-runs the existing
workflow
[~rohini], [~puru] can you check if you see any issues with this?
> Coordinator action reruns start a new workflow
> ----------------------------------------------
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
> Issue Type: Improvement
> Reporter: Srikanth Sundarrajan
>
> Coordinator action reruns start a new workflow and if existing workflow for
> the action is in running state, the same is not checked. Coord rerun can
> possibly do a workflow re-run to prevent this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)