[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Okolnychyi updated SPARK-35801:
-------------------------------------
       Shepherd: L. C. Hsieh
    Description: 
Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
important for modern Big Data workflows. Use cases include but are not limited 
to deleting a set of records for regulatory compliance, updating a set of 
records to fix an issue in the ingestion pipeline, applying changes in a 
transaction log to a fact table. Row-level operations allow users to easily 
express their use cases that would otherwise require much more SQL. Common 
patterns for updating partitions are to read, union, and overwrite or read, 
diff, and append. Using commands like MERGE, these operations are easier to 
express and can be more efficient to run.

Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
and Spark should implement similar support.

SPIP: 
https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60


  was:
[MERGE INTO|https://en.wikipedia.org/wiki/Merge_(SQL)] is well suited to 
large-scale workloads because it can express operations to insert, update, or 
delete multiple rows in a single SQL command. Many updates can be expressed as 
MERGE INTO queries that would otherwise require much more SQL. Common patterns 
for updating partitions are to read, union, and overwrite or read, diff, and 
append. Using MERGE INTO, these operations are easier to express and can be 
more efficient to run.

Hive supports [MERGE 
INTO|https://blog.cloudera.com/update-hive-tables-easy-way/] and Spark should 
implement similar support.

SPIP: 
https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



> SPIP: Support MERGE in Data Source V2
> -------------------------------------
>
>                 Key: SPARK-35801
>                 URL: https://issues.apache.org/jira/browse/SPARK-35801
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Anton Okolnychyi
>            Priority: Major
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to