As the author of the first linked PR, I think your points are good. Here is
my attempt to address them:

1: It is possible to do this today if you write a Slack callback. I would
be happy to share my code for this if you're having trouble integrating
Slack. That being said, it would be great if Airflow provided several
"default" callbacks for common platforms like Slack and Pagerduty.

2/3: Yes, Airflow should add callbacks for the DAG lifecycle, too. DAG
"SLAs" on the other hand, I am not sure would provide any additional value,
and have a high chance of being misused.

4: That's a great idea. My PR would make adding this very easy, because it
redefines the "SLAMiss" object as having a "type" of SLA miss. This would
involve adding a new type to the enum, and some logic to check when to
create an SLA miss of this type.

5: My interpretation is that you mean an email address that always gets
notified, regardless of any more specific users that a task says it should
email. (So not a default value to "emails", but instead an additional value
that is always added.) I think this makes a lot of sense and would be easy
to add to email. It would not be even remotely possible for a Slack
integration right now, since there's no unified code for that.

My preferred way of addressing this would be to get my PR merged as a
starting point, which isolates a lot of this functionality from the
scheduler code. Then have a broader AIP created, or possibly a pair of
them: switching to a more general evented system for Airflow model
lifecycles, and implementing pluggable notifiers (right now a lot of the
email functionality is hardcoded) the same way that there is already
pluggable logging.

>From an SRE perspective, two other pain points we run into: the statsd
integration is subpar (at least when we ingest it in Datadog it's hard to
actually alert on), and there's no /health or /healthz endpoints for the
scheduler and worker so it's hard to know if they are healthy in a
programmatic way.

On Wed, Nov 14, 2018 at 1:06 PM Niels Zeilemaker <ni...@zeilemaker.nl>
wrote:

> I had a go once to introduce something similar, but never got it merged.
> Maybe you can use it as an inspiration.
>
> https://github.com/apache/incubator-airflow/pull/2412
>
> Niels
>
> Op wo 14 nov. 2018 16:43 schreef Sai Phanindhra <phani8...@gmail.com:
>
> > Above mentioned PR address issues/bugs in current functionality. I want
> to
> > add more mediums of alerting which includes SLA.
> >
> > On Wed, 14 Nov 2018 at 20:51, airflowuser
> > <airflowu...@protonmail.com.invalid> wrote:
> >
> > > There is a pending PR to refactor the SLA:
> > > https://github.com/apache/incubator-airflow/pull/3584
> > >
> > > But it requires more reviews from committers.
> > >
> > >
> > > Sent with ProtonMail Secure Email.
> > >
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Wednesday, November 14, 2018 5:11 PM, Sai Phanindhra <
> > > phani8...@gmail.com> wrote:
> > >
> > > > Hello airflow committers and maintainers,
> > > > I came across sla in airflow. It's a very good feature to begin
> > > > with. I feel like few enhancements can be done. These enhancements
> are
> > > not
> > > > limited to just sla, they basically are voids i felt when im using
> > > airflow.
> > > > Im listing few of them here.
> > > >
> > > > 1.  SLA alerts to slack channel(s) along with emails
> > > > 2.  Alerts at DAG level(starting, success and failure).
> > > > 3.  custom callbacks just like `*on_failure_callback*`,
> > > `*on_retry_callback*` and `*on_success_callback*` on DAG level.
> > > > 4.  Alerts if task gets completed before minimum run time(This is
> > really
> > > >     a rare case. But there will be few long running jobs that we know
> > > for sure
> > > >     runs for at least few hours and if they exit before that it means
> > > something
> > > >     wrong. We need warning alerts for such cases.)
> > > >
> > > > 5.  Default/Global Alert config(default emails to send all alerts
> > and/or
> > > >     slack channel to send alerts)
> > > >
> > > >     Some of these might have already been solved or someone is
> working
> > to
> > > >     solve. Please share your thoughts and add anything else i missed
> to
> > > this
> > > >     list.
> > > >
> > >
> > >
> > >
> >
> > --
> > Sai Phanindhra,
> > Ph: +91 9043258999
> >
>

Reply via email to