[
https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050910#comment-14050910
]
Mona Chitnis commented on OOZIE-1913:
-------------------------------------
Expanding the scope of this problem:
Following are the scenarios and use-cases which can be tied in with turning off
SLA alerts:
h6. [1] Suspend:
User-initiated suspend of bundle/coordinator and specifying option to turn off
sla alerts. For coordinator, this can be followed by a list of coordinator
actions or will be applied to ALL. For bundle, currently no way to enumerate
coordinators so sla alerting to be turned off for all children coordinators'.
h6. [2] Rerun: For reprocessing purposes, same option as above can be given
while rerunning a coordinator and turning off sla-alerts for some or all of its
actions.
h6. [3] Catchup jobs: In backlogged situations, similar to comment#5, SLA
service should identify that coordinator is catching up and disable alerting
automatically. Of course, there should be a job-level minimum "threshold" in
terms of time for SLA to mark jobs as catchup e.g. > only turn off alerts if
nominal time more than 3 days old, or something to that effect. Need to think
about whether to specify this threshold like
"oozie.coordinator.sla.alert.disable.threshold" as part of job.properties at
submission time.
In cases [1]-[3], SLA calculation will go ahead and mark eventual MET/MISS
status for the jobs, only no alerts will be generated. Use case [4] enlists an
option if you would like to "resume" SLA tracking for jobs that you are
re-processing.
h6. [4] On-the-fly update of SLA expected-start, expected-end and
expected-duration: Similar to the new feature in Oozie where you can change
certain coordinator config such as concurrency, throttle etc on the fly, one
should be able to change the SLA limits given. This can be made as options to
'Rerun' of terminated coordinators, 'Resume' of suspended coordinators, or
'Change' command.
"Tagging" a certain SLA entry for 'disabling alerts' is not going to require
any XML changes on the part of the user. This is due to a good design choice we
made while implementing SLA.
> Devise a way to turn off SLA alerts when bundle/coordinator suspended
> ---------------------------------------------------------------------
>
> Key: OOZIE-1913
> URL: https://issues.apache.org/jira/browse/OOZIE-1913
> Project: Oozie
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Mona Chitnis
> Assignee: Mona Chitnis
> Fix For: trunk
>
>
> From user:
> Need to turn off the SLA miss alerts in jobs when the bundle is suspended for
> grid upgrades and similar work so that when it's resumed we aren't flooded
> with a bunch of alerts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)