[ 
https://issues.apache.org/jira/browse/OOZIE-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931432#comment-13931432
 ] 

Shwetha G S commented on OOZIE-1735:
------------------------------------

Can we categorise the failures as retriable and non-retriable and mark coord as 
FAILED for non-retriable errors, and KILLED for retriable errors. Allow re-runs 
on KILLED coords, but not on FAILED ones. For example, EL errors should fail 
the coord as any re-run will not help. But DB/network connectivity issues 
should kill the coord and allow re-runs on these. With this approach, the 
status will clearly signify the kind of error and the user can do blind re-runs 
on killed ones.

> Support re-running of failed coordinator and coordinator action
> ---------------------------------------------------------------
>
>                 Key: OOZIE-1735
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1735
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: purshotam shah
>            Assignee: purshotam shah
>
> We should support rerunning of failed job. Job are set to failed if there are 
> runtime error( like SQL timeout).
> In current scenario there is no way to recover beside running SQL.
> Rerun should set coord status to running and also set pending to 1 ,reset 
> doneMaterialization and last modified to current time. So that 
> materialization continues.
> We should also provide an option of resuming failed action. The behavior will 
> be same as killed option.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to