[ 
https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050910#comment-14050910
 ] 

Mona Chitnis commented on OOZIE-1913:
-------------------------------------

Expanding the scope of this problem:

Following are the scenarios and use-cases which can be tied in with turning off 
SLA alerts:

h6. [1] Suspend: 
User-initiated suspend of bundle/coordinator and specifying option to turn off 
sla alerts. For coordinator, this can be followed by a list of coordinator 
actions or will be applied to ALL. For bundle, currently no way to enumerate 
coordinators so sla alerting to be turned off for all children coordinators'.

h6. [2] Rerun: For reprocessing purposes, same option as above can be given 
while rerunning a coordinator and turning off sla-alerts for some or all of its 
actions.

h6. [3] Catchup jobs: In backlogged situations, similar to comment#5, SLA 
service should identify that coordinator is catching up and disable alerting 
automatically. Of course, there should be a job-level minimum "threshold" in 
terms of time for SLA to mark jobs as catchup e.g. > only turn off alerts if 
nominal time more than 3 days old, or something to that effect. Need to think 
about whether to specify this threshold like 
"oozie.coordinator.sla.alert.disable.threshold" as part of job.properties at 
submission time.

In cases [1]-[3], SLA calculation will go ahead and mark eventual MET/MISS 
status for the jobs, only no alerts will be generated. Use case [4] enlists an 
option if you would like to "resume" SLA tracking for jobs that you are 
re-processing.

h6. [4] On-the-fly update of SLA expected-start, expected-end and 
expected-duration: Similar to the new feature in Oozie where you can change 
certain coordinator config such as concurrency, throttle etc on the fly, one 
should be able to change the SLA limits given. This can be made as options to 
'Rerun' of terminated coordinators, 'Resume' of suspended coordinators, or 
'Change' command.

"Tagging" a certain SLA entry for 'disabling alerts' is not going to require 
any XML changes on the part of the user. This is due to a good design choice we 
made while implementing SLA.

> Devise a way to turn off SLA alerts when bundle/coordinator suspended
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-1913
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1913
>             Project: Oozie
>          Issue Type: Improvement
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>             Fix For: trunk
>
>
> From user:
> Need to turn off the SLA miss alerts in jobs when the bundle is suspended for
> grid upgrades and similar work so that when it's resumed we aren't flooded 
> with a bunch of alerts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to