[
https://issues.apache.org/jira/browse/HUDI-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-6479:
----------------------------
Fix Version/s: 0.16.0
> Update release docs and quick start guide around INSERT_INTO default behavior
> change
> -------------------------------------------------------------------------------------
>
> Key: HUDI-6479
> URL: https://issues.apache.org/jira/browse/HUDI-6479
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark-sql
> Reporter: sivabalan narayanan
> Assignee: Shiyan Xu
> Priority: Major
> Fix For: 0.15.0, 0.16.0
>
>
> With [this|https://github.com/apache/hudi/pull/9123] patch, we are also
> switching the default behavior with INSERT_INTO to use "insert" as the
> operation underneath. Until 0.13.1, default behavior was "upsert". In other
> words, if you ingest same batch of records in commit1 and in commit2, hudi
> will do an upsert and will return only the latest value with snapshot read.
> But with this patch, we are changing the default behavior to use "insert" as
> the name (INSERT_INTO) signifies. So, ingesting the same batch of records in
> commit1 and in commit2 will result in duplicates records with snapshot read.
> If users override the respective config, we will honor them, but the default
> behavior where none of the respective configs are overridden explicitly, will
> see a behavior change.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)