pmcquighan-camus commented on issue #57359: URL: https://github.com/apache/airflow/issues/57359#issuecomment-3457423860
> [@pmcquighan-camus](https://github.com/pmcquighan-camus), just my two cents, but I think that we should stick to Airflow retries for something like this. When you add retries within the logic in `.execute()`, it can cause some general confusion/hinder understanding as to what's actually going on. > > Let's say I'm a new user, and I only ever want my job to retry 3 times. I'd set my `retries=3` at the Task-level. Now, unknown to me, there is logic in the operator that retries 10 times without the Task failing. This would be unintended behavior. that makes sense to me in general. in this specific instance, the actual dataflow job was only ever launched/tried 1 time, and then the task failed while polling for completion status in the trigger. airflow did 2 retries, but it seemed like they both immediately failed as the trigger was marked as "failed" from the first attempt, and the 2nd/3rd attempts just started executing with `on_complete` and failed immediately [here](https://github.com/apache/airflow/blob/3.1.0/providers/google/src/airflow/providers/google/cloud/operators/dataflow.py#L646-L649). my understanding of what happened here is the 3 airflow task attempts resulted in 1 dataflow job being executed, and all 3 task attempts failed from 1 service-level 503 when polling for status. i feel that the 2nd/3rd tries should attempt to re-run the dataflow job (or at least retry querying for status of the job, since the dataflow job did in fact continue executing to completion and it would be surprising to have multiple dataflow jobs running for the airflow task) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
