jaklan commented on PR #43453: URL: https://github.com/apache/airflow/pull/43453#issuecomment-2474901731
@joellabes thanks for the input! The most confusing part here was whether we should take "UI mental model" or "API mental model": - UI mental model: e.g. each day we have a new scheduled run, but if some run fails - we can additionally trigger "retry from failure" - API mental model: `run` and `rerun` are 2 different endpoints, [where `rerun` can also trigger the full run](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job): > Use this endpoint to retry a failed run for a job from the point of failure, if the run failed. **Otherwise trigger a new run.** So one could argue the expected behaviour of the operator option is just to switch the endpoint which is used. We were in favor of the first approach, but we didn't want to take that decision by ourselves - now it's confirmed it's aligned with your vision as well 😉 Having said that, I wonder how we can make it more robust. With the current implementation, Airflow task can fail for whatever reason not related to dbt Cloud job (e.g. queue timeout) and be retried, which could trigger a significantly different behaviour if the previous run (e.g. from the previous day) in dbt Cloud didn't succeed - and that could lead to unexpected results. Quick idea could be to check if there's any dbt Cloud run ID associated with the first Airflow task try and based on its existence / status (+ `retry_from_failure` option) decide how retry should exactly behave, but I don't recall any simple, generic way to keep state between task retries (afaik XComs are cleared). Any ideas are more than welcome. > As for the tradeoffs between the possible approaches, I think it would have been useful to flag that it was a breaking change when the PR was opened, as opposed to https://github.com/apache/airflow/pull/43453#issuecomment-2466681024. The initial issue enumerated the options but didn't explicitly call out the tradeoffs associated with each one. That's true, we haven't expected things to happen so fast 😄 Especially we thought [that comment](https://github.com/apache/airflow/pull/43453#issuecomment-2465988895) would simply "freeze" the MR until we have a discussion during the today meeting, but well, it didn't 🙈 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
