[
https://issues.apache.org/jira/browse/AIRFLOW-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676624#comment-16676624
]
Daniel Lamblin commented on AIRFLOW-2679:
-----------------------------------------
The operator uses the Google Cloud Storage Hook to download the schema, and the
Big Query Hook to create the table, either as external or by loading. It does
this by setting a table insert job with a configuration that includes the write
disposition.
As you can see from the Google Cloud Big Query API
[https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs]
configuration.copy.writeDisposition only supports the three modes you listed
that Airflow in turn supports.
Merge is a query statement. It requires extra clauses to identify how to merge
for a match and no match.
Using it correctly involves two steps: loading the table, and merging the
loaded table with your target table.
As, in this scenario, the loaded table is likely just a staging table about to
be discarded after the merge statement, it would make sense to load it as an
external table, possibly saving time overall.
> GoogleCloudStorageToBigQueryOperator to support MERGE
> -----------------------------------------------------
>
> Key: AIRFLOW-2679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2679
> Project: Apache Airflow
> Issue Type: Improvement
> Reporter: jack
> Priority: Major
>
> Currently the
> {color:#222222}GoogleCloudStorageToBigQueryOp{color}{color:#222222}erator
> support the write_disposition parameter which can be : WRITE_TRUNCATE,
> WRITE_APPEND , WRITE_EMPTY{color}
>
> {color:#222222}However Google has another very useful writing method
> MERGE:{color}
> {color:#222222}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color}
> {color:#222222}{color:#222222}Support MERGE statement will be extremely
> useful.{color}{color}
> {color:#222222}{color:#222222}The idea behind this request is to do it
> directly from Google Storage file rather than load the file into a table and
> then run another MERGE statement.{color}{color}
>
> {color:#222222}{color:#222222}The MERGE statement is really helpful when one
> wants his records to be updated rather than appended or replaced.
> {color}{color}
>
> {color:#222222} {color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)