potiuk commented on issue #24914:
URL: https://github.com/apache/airflow/issues/24914#issuecomment-1181414773

   > Thanks for getting back to me so quickly! Normally when I'm looking at 
these charts I'm looking for things like anomalous run times (either becuase of 
starting late or taking too long) and degradation of performance... general 
indicators that we need to take some corrective action on the pipeline to keep 
it hitting SLA. Ideally we would also use the SLA functionality in Airflow but 
in its current format it's not particularly useful. Until then we're reliant on 
data presented in the UI itself with someone monitoring it on a periodic basis.
   
   I think it's actually best if you can monitor your Airlfow SLA for Tasks 
outside of Airflow. The SLA feature is indeed not very useful, but I personally 
think Airlfow in it's future incarnation will become more of the platform and 
expose more of it's metrics and stats to external systems that are way better 
in "monitoring" than Airflow will ever be. Usually I'd recommend to integrate 
stats from Airflow into other monitoring tools you already use.  This will be 
even easier after we add open telemetry support: 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow
   
   A very good example of what can be achieved even today was actually 
presented at the Airflow Summit 2022 
https://airflowsummit.org/sessions/2022/the-slayer-your-data-pipeline-needs/. - 
this is a cool example of how you can build your monitoring using Airflow as 
Platform.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to