Ohashiro commented on issue #44618: URL: https://github.com/apache/airflow/issues/44618#issuecomment-2572631048
Hello! @dabla thank you for your investigation and suggestion! Regarding your first point: > I also went back to the code and I still don't understand why the call would still fail when the task instance for that operator is retried the second time by Airflow, as some time would have passed between the first and the second attempt, I would expect the second call to succeed but apparently it still doesn't. From what I understand, when the task fails (for example, in our case, because the refreshId was not found), the operator cancels the refresh (using `cancel_dataset_refresh` hook method). So when the task is retried by Airflow, a new refresh is triggered with a new refreshId. https://github.com/apache/airflow/blob/413a1833c302e2409d2ac96c6521f97e6589a594/providers/src/airflow/providers/microsoft/azure/hooks/powerbi.py#L202 Regarding the fix implementation, the delay solution we discussed seems to work well (which, imho, confirm the bug root cause), but I agree that this fix is more of a "quick fix" than a clean one. I can work on the refactor you suggested to check, I just have a few questions: > so that when the seconds calls fails the operator can directly retry the second Trigger call Regarding the retry mechanism you are suggesting in the operator, how would you do it? - Would you let the operator fail, then on the task retry, the operator directly retries to get the status? Then would the refreshId be kept in XCom between both task executions? - Or would you ask the operator to retry the trigger if it fails? I'm not sure how it'd work with the "defer", would you handle this in the `execution_complete`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
