Jorricks commented on PR #25489: URL: https://github.com/apache/airflow/pull/25489#issuecomment-1209121259
@argibbs these changes look great! Very well described as well. > am acutely aware / wary of the fact that this separation-of-the-queues-plus-config-param is purely solving a problem that is (for me at least) purely a theoretically-possible issue. I've never seen it, but I don't think it's premature optimisation to tackle it in this change. We have had the case where because of a crucial singel point of failure, hardly any DAG was able to continue (as we use a lot of ExternalTaskSensors). This meant that all of our schedulers were down and handling SLA fires for a long time so I am very happy to see this optimisation 👍 ### Note: I don't think this should be included in this MR but given you worked on this, you probably have a good view on this. I was thinking. Part of what we can speed up is to instead of fire each SLA for the same DAG individually, group SLAs per DAG. Thereby, we add a dag with an SLA to the queue instead of an actual SLA. This prevents us from parsing a DAG file 10 times, for 10 different SLAs of the same DAG. I guess you could optimise this by including a dummy operator at the end that requires all the important tasks to succeed, but that's not super user friendly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
