[ https://issues.apache.org/jira/browse/OOZIE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dénes Bodó updated OOZIE-548: ----------------------------- Summary: OOZIE-131: Support WF action level retry (was: OOZIE-131: Support WF action level rery) > OOZIE-131: Support WF action level retry > ---------------------------------------- > > Key: OOZIE-548 > URL: https://issues.apache.org/jira/browse/OOZIE-548 > Project: Oozie > Issue Type: New Feature > Reporter: Mohammad Islam > Assignee: Roman Shaposhnik > Priority: Major > > While there are hadoop task level retry and oozie level retry for any > transient error, it is desirable to allow WF action level retry configured by > user as well. > In this proposed task, the following sub-tasks needs to be considered: > 1. Enable user to specify the retry count and retry interval (time between > two successive tries). > 2. Retry interval will be in minutes and the default value is 10 minutes. The > default value should be system level configuration. > 3. Default retry count is 0 (no-retry), to keep backward compatible. > 4. A new state called "RETRY" will be added in WF action. An action will be > in RETRY state, if the job failed and needs to be retried. > 5. Three fields needs to be added into WF action table. retry_count, > max_retry, retry_interval. > 6. Some services like Recovery service will periodically check for the > following sql "select action_id from WF_ACTIONS where status = 'RETRY' and > (last_modified_time + retry_interval ) < current_time and max_retry > > retry_count)" and queue RETRY_COMMAND. The last filter of SQL might not be > required. > 5. RETRY_COMMAND will update the status from RETRY to PREP and push a > ActionStartXCommand. > Open Question: > a) Who will remove the temporary directories/files (such as ACTION_DIR) > created by Oozie? Is it part when the job moves to RETRY state? Or > RETRY_COMMAND could do it? > b) Do we need to keep historical information such as why the previous retries > failed? Historical information includes error code, error message etc. > c)anything else? -- This message was sent by Atlassian Jira (v8.3.4#803005)