Thanks James for the input. For the problems i specified above, i build hacky solutions like adding one `*slack_start_notification_operation*` in beginning, ` *slack_end_notification_operator*` in the end and ` *slack_failed_notification_operation*` when upstream fails. This addresses first 3 issues/feature requirements i spoke about. I am maintaining lists for emails and at dag level i'm joing all required emails for addressing point 5. Still i feel like this is manual work and need to be done every time a new dag onboards in airflow. I feel like these are common problems many of the airflow users/developers face. @James <jmeic...@quantopian.com> lets catch up someone on slack/hangout to discuss how these enhancements can be done.
On Thu, 15 Nov 2018 at 00:10, James Meickle <jmeic...@quantopian.com.invalid> wrote: > As the author of the first linked PR, I think your points are good. Here is > my attempt to address them: > > 1: It is possible to do this today if you write a Slack callback. I would > be happy to share my code for this if you're having trouble integrating > Slack. That being said, it would be great if Airflow provided several > "default" callbacks for common platforms like Slack and Pagerduty. > > 2/3: Yes, Airflow should add callbacks for the DAG lifecycle, too. DAG > "SLAs" on the other hand, I am not sure would provide any additional value, > and have a high chance of being misused. > > 4: That's a great idea. My PR would make adding this very easy, because it > redefines the "SLAMiss" object as having a "type" of SLA miss. This would > involve adding a new type to the enum, and some logic to check when to > create an SLA miss of this type. > > 5: My interpretation is that you mean an email address that always gets > notified, regardless of any more specific users that a task says it should > email. (So not a default value to "emails", but instead an additional value > that is always added.) I think this makes a lot of sense and would be easy > to add to email. It would not be even remotely possible for a Slack > integration right now, since there's no unified code for that. > > My preferred way of addressing this would be to get my PR merged as a > starting point, which isolates a lot of this functionality from the > scheduler code. Then have a broader AIP created, or possibly a pair of > them: switching to a more general evented system for Airflow model > lifecycles, and implementing pluggable notifiers (right now a lot of the > email functionality is hardcoded) the same way that there is already > pluggable logging. > > From an SRE perspective, two other pain points we run into: the statsd > integration is subpar (at least when we ingest it in Datadog it's hard to > actually alert on), and there's no /health or /healthz endpoints for the > scheduler and worker so it's hard to know if they are healthy in a > programmatic way. > > On Wed, Nov 14, 2018 at 1:06 PM Niels Zeilemaker <ni...@zeilemaker.nl> > wrote: > > > I had a go once to introduce something similar, but never got it merged. > > Maybe you can use it as an inspiration. > > > > https://github.com/apache/incubator-airflow/pull/2412 > > > > Niels > > > > Op wo 14 nov. 2018 16:43 schreef Sai Phanindhra <phani8...@gmail.com: > > > > > Above mentioned PR address issues/bugs in current functionality. I want > > to > > > add more mediums of alerting which includes SLA. > > > > > > On Wed, 14 Nov 2018 at 20:51, airflowuser > > > <airflowu...@protonmail.com.invalid> wrote: > > > > > > > There is a pending PR to refactor the SLA: > > > > https://github.com/apache/incubator-airflow/pull/3584 > > > > > > > > But it requires more reviews from committers. > > > > > > > > > > > > Sent with ProtonMail Secure Email. > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Wednesday, November 14, 2018 5:11 PM, Sai Phanindhra < > > > > phani8...@gmail.com> wrote: > > > > > > > > > Hello airflow committers and maintainers, > > > > > I came across sla in airflow. It's a very good feature to begin > > > > > with. I feel like few enhancements can be done. These enhancements > > are > > > > not > > > > > limited to just sla, they basically are voids i felt when im using > > > > airflow. > > > > > Im listing few of them here. > > > > > > > > > > 1. SLA alerts to slack channel(s) along with emails > > > > > 2. Alerts at DAG level(starting, success and failure). > > > > > 3. custom callbacks just like `*on_failure_callback*`, > > > > `*on_retry_callback*` and `*on_success_callback*` on DAG level. > > > > > 4. Alerts if task gets completed before minimum run time(This is > > > really > > > > > a rare case. But there will be few long running jobs that we > know > > > > for sure > > > > > runs for at least few hours and if they exit before that it > means > > > > something > > > > > wrong. We need warning alerts for such cases.) > > > > > > > > > > 5. Default/Global Alert config(default emails to send all alerts > > > and/or > > > > > slack channel to send alerts) > > > > > > > > > > Some of these might have already been solved or someone is > > working > > > to > > > > > solve. Please share your thoughts and add anything else i > missed > > to > > > > this > > > > > list. > > > > > > > > > > > > > > > > > > > > > > > -- > > > Sai Phanindhra, > > > Ph: +91 9043258999 > > > > > > -- Sai Phanindhra, Ph: +91 9043258999