nathadfield commented on PR #43785:
URL: https://github.com/apache/airflow/pull/43785#issuecomment-2464188657

   > I'm not strongly against, but why not using the existing the operator for 
that? (I'm questioning the atomicity of transfer operators in general)
   
   It's a fair question.
   
   The thought process came about because of a scenario I have encountered 
revolves around the use of BigQuery dataset expiration policies that will 
automatically drop tables after a specified amount of time, e.g. 7 days, which 
we do for temporary/staging areas.
   
   Now, suppose I use the `GCSToBQOperator` with `CREATE_IF_NEEDED` to load 
some data followed by another task to perform a query against it, initially 
this will result in a table being created that will expire exactly 7 days after 
it was created.
   
   On that seventh day, if everything all runs at the same time, then the table 
will not have expired yet so the GCSToBQ task will succeed but not recreate the 
table.  However, in the few seconds between this task ending and the downstream 
task starting it will be deleted resulting in a task failure due to the table 
not existing.
   
   The current solution to this is to add a task using 
`BigQueryDeleteTableOperator` which is perfectly viable but just results in 
lots of extra tasks.  Ideally there would be a another `CREATE_DISPOSITION` 
option in BigQuery - `ALWAYS_RECREATE`? - which would achieve the same outcome. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to