[
https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050625#comment-14050625
]
Mona Chitnis commented on OOZIE-1913:
-------------------------------------
Some discussion points:
h5. Approach 1:
Change SLA behavior for all jobs on suspend. i.e. not track SLA for suspended
jobs. However this was originally put into place because users need to be
notified of their job SLAs in the event of suspension caused by system (Oozie
server restart/ transient errors from Hadoop cluster). So making this change
across all suspended jobs would not be ideal.
h5. Approach 2:
Add a command line option like {{-ignoresla}} along with suspend command, which
will flag it accordingly in the memory map of the SLA calculator. This then
entails two sub-approaches
h6. 2A]
On seeing {{-ignoresla}}, mark the eventProcessed byte of the SLA entry to
{{1000 (8) }} to remove it from being tracked anymore for SLA. The resume
command will also need an option like {{-resumesla}} to then add this job back
into SLA map for tracking, along with more options for revised expected end
time and expected duration of job.
h6. 2B]
If we dont wish to change the eventProcessed byte so that we dont have to
recalculate it, we can add a flag to the job, to indicate to ignore SLA for
this job till unset. However, this requires adding a column to the Sla_Summary
table schema to be able to retain this information across Oozie server restarts
and in HA mode.
2A seems to be preferable to me. Thoughts?
> Devise a way to turn off SLA alerts when bundle/coordinator suspended
> ---------------------------------------------------------------------
>
> Key: OOZIE-1913
> URL: https://issues.apache.org/jira/browse/OOZIE-1913
> Project: Oozie
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Mona Chitnis
> Assignee: Mona Chitnis
> Fix For: trunk
>
>
> From user:
> Need to turn off the SLA miss alerts in jobs when the bundle is suspended for
> grid upgrades and similar work so that when it's resumed we aren't flooded
> with a bunch of alerts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)