[ 
https://issues.apache.org/jira/browse/HUDI-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5324:
----------------------------------
    Description: 
h4. *UPDATED*

Aforementioned issue was actually a result of misconfiguration of the Merge 
Into statement – MIT was using "insert" operation instead of "upsert".

Real issue though is that MIT implicitly predicates using "upsert" operation 
onto whether "preCombine" config is set. Instead, it should always specify 
operation as "upsert", since MIT allows to specify updating semantics w/o 
requiring presence of the "preCombine" field

  was:
~When setting hoodie.index.type=BLOOM in the hudi-defaults.conf, while the 
Spark SQL DELETE statement uses Bloom Index, the MERGE INTO statement does not 
seem to use Bloom Index and instead uses Simple Index.~
h4. *UPDATE*

Aforementioned issue was actually a result of misconfiguration of the Merge 
Into statement – MIT was using "insert" operation instead of "upsert".

Real issue though is that MIT implicitly predicates using "upsert" operation 
onto whether "preCombine" config is set. Instead, it should always specify 
operation as "upsert", since MIT allows to specify updating semantics w/o 
requiring presence of the "preCombine" field


> Spark SQL MERGE INTO statement should always do upsert if there's matching 
> update clause
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-5324
>                 URL: https://issues.apache.org/jira/browse/HUDI-5324
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: index, spark-sql
>            Reporter: Ethan Guo
>            Assignee: Alexey Kudinkin
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> h4. *UPDATED*
> Aforementioned issue was actually a result of misconfiguration of the Merge 
> Into statement – MIT was using "insert" operation instead of "upsert".
> Real issue though is that MIT implicitly predicates using "upsert" operation 
> onto whether "preCombine" config is set. Instead, it should always specify 
> operation as "upsert", since MIT allows to specify updating semantics w/o 
> requiring presence of the "preCombine" field



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to