[
https://issues.apache.org/jira/browse/AIRFLOW-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jack updated AIRFLOW-2679:
--------------------------
Description:
Currently the
{color:#222222}GoogleCloudStorageToBigQueryOp{color}{color:#222222}erator
support the write_disposition parameter which can be : WRITE_TRUNCATE,
WRITE_APPEND , WRITE_EMPTY{color}
{color:#222222}However Google has another very useful writing method
MERGE:{color}
{color:#222222}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color}
{color:#222222}{color:#222222}Support MERGE statement will be extremely
useful.{color}{color}
{color:#222222}{color:#222222}The idea behind this request is to do it
directly from Google Storage file rather than load the file into a table and
then run another MERGE statement.{color}{color}
{color:#222222}{color:#222222}The MERGE statement is really helpful when one
wants his records to be updated rather than appended or replaced. {color}{color}
{color:#222222} {color}
was:
Currently the
{color:#222222}GoogleCloudStorageToBigQueryOp{color}{color:#222222}erator
support incremental load using
*{color:#404040}max_id_key{color}*{color:#404040} {color}.{color}
{color:#222222}However many systems actually needs "UPSERT" in terms of - if
row exists update it, if not insert/copy it.{color}
{color:#222222}Currently the operator assumes that we only need to insert new
data, it can't handle update of data. Most of the time data is not static it
changes with time. Yesterday order status was NEW today it's Processing,
tomorrow it's SENT in a month it will be REFUNDED etc... {color}
{color:#222222} {color}
Summary: GoogleCloudStorageToBigQueryOperator to support MERGE (was:
GoogleCloudStorageToBigQueryOperator to support UPSERT)
> GoogleCloudStorageToBigQueryOperator to support MERGE
> -----------------------------------------------------
>
> Key: AIRFLOW-2679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2679
> Project: Apache Airflow
> Issue Type: Improvement
> Reporter: jack
> Priority: Major
>
> Currently the
> {color:#222222}GoogleCloudStorageToBigQueryOp{color}{color:#222222}erator
> support the write_disposition parameter which can be : WRITE_TRUNCATE,
> WRITE_APPEND , WRITE_EMPTY{color}
>
> {color:#222222}However Google has another very useful writing method
> MERGE:{color}
> {color:#222222}[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples]{color}
> {color:#222222}{color:#222222}Support MERGE statement will be extremely
> useful.{color}{color}
> {color:#222222}{color:#222222}The idea behind this request is to do it
> directly from Google Storage file rather than load the file into a table and
> then run another MERGE statement.{color}{color}
>
> {color:#222222}{color:#222222}The MERGE statement is really helpful when one
> wants his records to be updated rather than appended or replaced.
> {color}{color}
>
> {color:#222222} {color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)