jaklan commented on PR #43453:
URL: https://github.com/apache/airflow/pull/43453#issuecomment-2474901731

   @joellabes thanks for the input! The most confusing part here was whether we 
should take "UI mental model" or "API mental model":
   - UI mental model: e.g. each day we have a new scheduled run, but if some 
run fails - we can additionally trigger "retry from failure"
   - API mental model: `run` and `rerun` are 2 different endpoints, [where 
`rerun` can also trigger the full 
run](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job):
     > Use this endpoint to retry a failed run for a job from the point of 
failure, if the run failed. **Otherwise trigger a new run.** 
     
   So one could argue the expected behaviour of the operator option is just to 
switch the endpoint which is used. We were in favor of the first approach, but 
we didn't want to take that decision by ourselves - now it's confirmed it's 
aligned with your vision as well 😉
   
   Having said that, I wonder how we can make it more robust. With the current 
implementation, Airflow task can fail for whatever reason not related to dbt 
Cloud job (e.g. queue timeout) and be retried, which could trigger a 
significantly different behaviour if the previous run (e.g. from the previous 
day) in dbt Cloud didn't succeed - and that could lead to unexpected results.
   
   Quick idea could be to check if there's any dbt Cloud run ID associated with 
the first Airflow task try and based on its existence / status (+ 
`retry_from_failure` option) decide how retry should exactly behave, but I 
don't recall any simple, generic way to keep state between task retries (afaik 
XComs are cleared). Any ideas are more than welcome.
   
   > As for the tradeoffs between the possible approaches, I think it would 
have been useful to flag that it was a breaking change when the PR was opened, 
as opposed to 
https://github.com/apache/airflow/pull/43453#issuecomment-2466681024. The 
initial issue enumerated the options but didn't explicitly call out the 
tradeoffs associated with each one.
   
   That's true, we haven't expected things to happen so fast 😄 Especially we 
thought [that 
comment](https://github.com/apache/airflow/pull/43453#issuecomment-2465988895) 
would simply "freeze" the MR until we have a discussion during the today 
meeting, but well, it didn't 🙈


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to