OK so one difference here is, you're adding a new DAG SLA concept.  Which
is useful.  One subtle difference from what I think is the existing
"concept" of SLA is that you are evaluating it against when it started, as
opposed to when it should have started, and evaluating it only in the
course of running.

Let's suppose for a moment that everyone is on board with this and thinks
it's a necessary tradeoff.

Well now let's look at individual task instance SLAs.  With that change in
the concept of what an SLA is, do we still *need* to move to "soft timeout"
for individual tasks?  I think maybe no.  Because, why could we not, at the
same time as we evaluate the dag run SLA, also evaluate each task's SLA,
and evaluate it against the same "start time" that the overall DAG SLA is
evaluating against?  This would seem to be more *like* the existing SLA
concept for individual tasks, the difference being it requires the dag to
be running (which is already a requirement of your new task SLA concept).
The other difference, again, is the start time vs should-have-started time
distinction.  But this would also seem to remove the "doesn't work for
deferrables" problem.

Reply via email to