[ 
https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050625#comment-14050625
 ] 

Mona Chitnis commented on OOZIE-1913:
-------------------------------------

Some discussion points:

h5. Approach 1:
Change SLA behavior for all jobs on suspend. i.e. not track SLA for suspended 
jobs. However this was originally put into place because users need to be 
notified of their job SLAs in the event of suspension caused by system (Oozie 
server restart/ transient errors from Hadoop cluster). So making this change 
across all suspended jobs would not be ideal.

h5. Approach 2:
Add a command line option like {{-ignoresla}} along with suspend command, which 
will flag it accordingly in the memory map of the SLA calculator. This then 
entails two sub-approaches

h6. 2A]
On seeing {{-ignoresla}}, mark the eventProcessed byte of the SLA entry to 
{{1000 (8) }} to remove it from being tracked anymore for SLA. The resume 
command will also need an option like {{-resumesla}} to then add this job back 
into SLA map for tracking, along with more options for revised expected end 
time and expected duration of job.

h6. 2B]
If we dont wish to change the eventProcessed byte so that we dont have to 
recalculate it, we can add a flag to the job, to indicate to ignore SLA for 
this job till unset. However, this requires adding a column to the Sla_Summary 
table schema to be able to retain this information across Oozie server restarts 
and in HA mode.

2A seems to be preferable to me. Thoughts?


> Devise a way to turn off SLA alerts when bundle/coordinator suspended
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-1913
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1913
>             Project: Oozie
>          Issue Type: Improvement
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>             Fix For: trunk
>
>
> From user:
> Need to turn off the SLA miss alerts in jobs when the bundle is suspended for
> grid upgrades and similar work so that when it's resumed we aren't flooded 
> with a bunch of alerts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to