Thanks James for the input.
For the problems i specified above, i build hacky solutions like adding one
`*slack_start_notification_operation*` in beginning, `
*slack_end_notification_operator*` in the end and `
*slack_failed_notification_operation*` when upstream fails. This addresses
first 3 issues/feature requirements i spoke about. I am maintaining lists
for emails and at dag level i'm joing all required emails for addressing
point 5. Still i feel like this is manual work and need to be done every
time a new dag onboards in airflow. I feel like these are common problems
many of the airflow users/developers face.
@James <jmeic...@quantopian.com> lets catch up someone on slack/hangout to
discuss how these enhancements can be done.


On Thu, 15 Nov 2018 at 00:10, James Meickle <jmeic...@quantopian.com.invalid>
wrote:

> As the author of the first linked PR, I think your points are good. Here is
> my attempt to address them:
>
> 1: It is possible to do this today if you write a Slack callback. I would
> be happy to share my code for this if you're having trouble integrating
> Slack. That being said, it would be great if Airflow provided several
> "default" callbacks for common platforms like Slack and Pagerduty.
>
> 2/3: Yes, Airflow should add callbacks for the DAG lifecycle, too. DAG
> "SLAs" on the other hand, I am not sure would provide any additional value,
> and have a high chance of being misused.
>
> 4: That's a great idea. My PR would make adding this very easy, because it
> redefines the "SLAMiss" object as having a "type" of SLA miss. This would
> involve adding a new type to the enum, and some logic to check when to
> create an SLA miss of this type.
>
> 5: My interpretation is that you mean an email address that always gets
> notified, regardless of any more specific users that a task says it should
> email. (So not a default value to "emails", but instead an additional value
> that is always added.) I think this makes a lot of sense and would be easy
> to add to email. It would not be even remotely possible for a Slack
> integration right now, since there's no unified code for that.
>
> My preferred way of addressing this would be to get my PR merged as a
> starting point, which isolates a lot of this functionality from the
> scheduler code. Then have a broader AIP created, or possibly a pair of
> them: switching to a more general evented system for Airflow model
> lifecycles, and implementing pluggable notifiers (right now a lot of the
> email functionality is hardcoded) the same way that there is already
> pluggable logging.
>
> From an SRE perspective, two other pain points we run into: the statsd
> integration is subpar (at least when we ingest it in Datadog it's hard to
> actually alert on), and there's no /health or /healthz endpoints for the
> scheduler and worker so it's hard to know if they are healthy in a
> programmatic way.
>
> On Wed, Nov 14, 2018 at 1:06 PM Niels Zeilemaker <ni...@zeilemaker.nl>
> wrote:
>
> > I had a go once to introduce something similar, but never got it merged.
> > Maybe you can use it as an inspiration.
> >
> > https://github.com/apache/incubator-airflow/pull/2412
> >
> > Niels
> >
> > Op wo 14 nov. 2018 16:43 schreef Sai Phanindhra <phani8...@gmail.com:
> >
> > > Above mentioned PR address issues/bugs in current functionality. I want
> > to
> > > add more mediums of alerting which includes SLA.
> > >
> > > On Wed, 14 Nov 2018 at 20:51, airflowuser
> > > <airflowu...@protonmail.com.invalid> wrote:
> > >
> > > > There is a pending PR to refactor the SLA:
> > > > https://github.com/apache/incubator-airflow/pull/3584
> > > >
> > > > But it requires more reviews from committers.
> > > >
> > > >
> > > > Sent with ProtonMail Secure Email.
> > > >
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > On Wednesday, November 14, 2018 5:11 PM, Sai Phanindhra <
> > > > phani8...@gmail.com> wrote:
> > > >
> > > > > Hello airflow committers and maintainers,
> > > > > I came across sla in airflow. It's a very good feature to begin
> > > > > with. I feel like few enhancements can be done. These enhancements
> > are
> > > > not
> > > > > limited to just sla, they basically are voids i felt when im using
> > > > airflow.
> > > > > Im listing few of them here.
> > > > >
> > > > > 1.  SLA alerts to slack channel(s) along with emails
> > > > > 2.  Alerts at DAG level(starting, success and failure).
> > > > > 3.  custom callbacks just like `*on_failure_callback*`,
> > > > `*on_retry_callback*` and `*on_success_callback*` on DAG level.
> > > > > 4.  Alerts if task gets completed before minimum run time(This is
> > > really
> > > > >     a rare case. But there will be few long running jobs that we
> know
> > > > for sure
> > > > >     runs for at least few hours and if they exit before that it
> means
> > > > something
> > > > >     wrong. We need warning alerts for such cases.)
> > > > >
> > > > > 5.  Default/Global Alert config(default emails to send all alerts
> > > and/or
> > > > >     slack channel to send alerts)
> > > > >
> > > > >     Some of these might have already been solved or someone is
> > working
> > > to
> > > > >     solve. Please share your thoughts and add anything else i
> missed
> > to
> > > > this
> > > > >     list.
> > > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Sai Phanindhra,
> > > Ph: +91 9043258999
> > >
> >
>


-- 
Sai Phanindhra,
Ph: +91 9043258999

Reply via email to