nathadfield commented on PR #43785: URL: https://github.com/apache/airflow/pull/43785#issuecomment-2464188657
> I'm not strongly against, but why not using the existing the operator for that? (I'm questioning the atomicity of transfer operators in general) It's a fair question. The thought process came about because of a scenario I have encountered revolves around the use of BigQuery dataset expiration policies that will automatically drop tables after a specified amount of time, e.g. 7 days, which we do for temporary/staging areas. Now, suppose I use the `GCSToBQOperator` with `CREATE_IF_NEEDED` to load some data followed by another task to perform a query against it, initially this will result in a table being created that will expire exactly 7 days after it was created. On that seventh day, if everything all runs at the same time, then the table will not have expired yet so the GCSToBQ task will succeed but not recreate the table. However, in the few seconds between this task ending and the downstream task starting it will be deleted resulting in a task failure due to the table not existing. The current solution to this is to add a task using `BigQueryDeleteTableOperator` which is perfectly viable but just results in lots of extra tasks. Ideally there would be a another `CREATE_DISPOSITION` option in BigQuery - `ALWAYS_RECREATE`? - which would achieve the same outcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
