This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 75edfa7  [MINOR][DOCS] Add more description for UPSERT operation(no 
duplicates) (#2572)
75edfa7 is described below

commit 75edfa7153f311bf6220dc25611b52cde0b5e7c3
Author: tooptoop4 <[email protected]>
AuthorDate: Sat Feb 13 07:59:14 2021 +0000

    [MINOR][DOCS] Add more description for UPSERT operation(no duplicates) 
(#2572)
---
 docs/_docs/0.7.0/2_2_writing_data.md | 2 +-
 docs/_docs/2_2_writing_data.md       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/0.7.0/2_2_writing_data.md 
b/docs/_docs/0.7.0/2_2_writing_data.md
index 91610f2..0ff2204 100644
--- a/docs/_docs/0.7.0/2_2_writing_data.md
+++ b/docs/_docs/0.7.0/2_2_writing_data.md
@@ -20,7 +20,7 @@ can be chosen/changed across each commit/deltacommit issued 
against the table.
 
  - **UPSERT** : This is the default operation where the input records are 
first tagged as inserts or updates by looking up the index. 
  The records are ultimately written after heuristics are run to determine how 
best to pack them on storage to optimize for things like file sizing. 
- This operation is recommended for use-cases like database change capture 
where the input almost certainly contains updates.
+ This operation is recommended for use-cases like database change capture 
where the input almost certainly contains updates. The target table will never 
show duplicates.
  - **INSERT** : This operation is very similar to upsert in terms of 
heuristics/file sizing but completely skips the index lookup step. Thus, it can 
be a lot faster than upserts 
  for use-cases like log de-duplication (in conjunction with options to filter 
duplicates mentioned below). This is also suitable for use-cases where the 
table can tolerate duplicates, but just 
  need the transactional writes/incremental pull/storage management 
capabilities of Hudi.
diff --git a/docs/_docs/2_2_writing_data.md b/docs/_docs/2_2_writing_data.md
index 07575f8..6b51878 100644
--- a/docs/_docs/2_2_writing_data.md
+++ b/docs/_docs/2_2_writing_data.md
@@ -19,7 +19,7 @@ can be chosen/changed across each commit/deltacommit issued 
against the table.
 
  - **UPSERT** : This is the default operation where the input records are 
first tagged as inserts or updates by looking up the index. 
  The records are ultimately written after heuristics are run to determine how 
best to pack them on storage to optimize for things like file sizing. 
- This operation is recommended for use-cases like database change capture 
where the input almost certainly contains updates.
+ This operation is recommended for use-cases like database change capture 
where the input almost certainly contains updates. The target table will never 
show duplicates.
  - **INSERT** : This operation is very similar to upsert in terms of 
heuristics/file sizing but completely skips the index lookup step. Thus, it can 
be a lot faster than upserts 
  for use-cases like log de-duplication (in conjunction with options to filter 
duplicates mentioned below). This is also suitable for use-cases where the 
table can tolerate duplicates, but just 
  need the transactional writes/incremental pull/storage management 
capabilities of Hudi.

Reply via email to