shahar1 commented on PR #43785:
URL: https://github.com/apache/airflow/pull/43785#issuecomment-2464236370

   > > I'm not strongly against, but why not using the existing the operator 
for that? (I'm questioning the atomicity of transfer operators in general)
   > 
   > It's a fair question.
   > 
   > The thought process came about because of a scenario I have encountered 
revolving around the use of BigQuery dataset expiration policies that will 
automatically drop tables after a specified amount of time, e.g. 7 days, which 
we do for temporary/staging areas.
   > 
   > Now, suppose I use the `GCSToBQOperator` with `CREATE_IF_NEEDED` to load 
some data followed by another task to perform a query against it, initially 
this will result in a table being created that will expire exactly 7 days after 
it was created.
   > 
   > On that seventh day, if everything all runs at the same time, then the 
table will not have expired yet so the GCSToBQ task will succeed but not 
recreate the table. However, in the few seconds between this task ending and 
the downstream task starting it will be deleted resulting in a task failure due 
to the table not existing.
   > 
   > The current solution to this is to add a prior task using 
`BigQueryDeleteTableOperator` which is perfectly viable but just results in 
lots of extra tasks. Ideally there would be a another `CREATE_DISPOSITION` 
option in BigQuery - `ALWAYS_RECREATE`? - which would achieve the same outcome.
   
   Sounds fine by me, I'd be happy for additional feedback before merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to