This is an automated email from the ASF dual-hosted git repository.
vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 75edfa7 [MINOR][DOCS] Add more description for UPSERT operation(no
duplicates) (#2572)
75edfa7 is described below
commit 75edfa7153f311bf6220dc25611b52cde0b5e7c3
Author: tooptoop4 <[email protected]>
AuthorDate: Sat Feb 13 07:59:14 2021 +0000
[MINOR][DOCS] Add more description for UPSERT operation(no duplicates)
(#2572)
---
docs/_docs/0.7.0/2_2_writing_data.md | 2 +-
docs/_docs/2_2_writing_data.md | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/_docs/0.7.0/2_2_writing_data.md
b/docs/_docs/0.7.0/2_2_writing_data.md
index 91610f2..0ff2204 100644
--- a/docs/_docs/0.7.0/2_2_writing_data.md
+++ b/docs/_docs/0.7.0/2_2_writing_data.md
@@ -20,7 +20,7 @@ can be chosen/changed across each commit/deltacommit issued
against the table.
- **UPSERT** : This is the default operation where the input records are
first tagged as inserts or updates by looking up the index.
The records are ultimately written after heuristics are run to determine how
best to pack them on storage to optimize for things like file sizing.
- This operation is recommended for use-cases like database change capture
where the input almost certainly contains updates.
+ This operation is recommended for use-cases like database change capture
where the input almost certainly contains updates. The target table will never
show duplicates.
- **INSERT** : This operation is very similar to upsert in terms of
heuristics/file sizing but completely skips the index lookup step. Thus, it can
be a lot faster than upserts
for use-cases like log de-duplication (in conjunction with options to filter
duplicates mentioned below). This is also suitable for use-cases where the
table can tolerate duplicates, but just
need the transactional writes/incremental pull/storage management
capabilities of Hudi.
diff --git a/docs/_docs/2_2_writing_data.md b/docs/_docs/2_2_writing_data.md
index 07575f8..6b51878 100644
--- a/docs/_docs/2_2_writing_data.md
+++ b/docs/_docs/2_2_writing_data.md
@@ -19,7 +19,7 @@ can be chosen/changed across each commit/deltacommit issued
against the table.
- **UPSERT** : This is the default operation where the input records are
first tagged as inserts or updates by looking up the index.
The records are ultimately written after heuristics are run to determine how
best to pack them on storage to optimize for things like file sizing.
- This operation is recommended for use-cases like database change capture
where the input almost certainly contains updates.
+ This operation is recommended for use-cases like database change capture
where the input almost certainly contains updates. The target table will never
show duplicates.
- **INSERT** : This operation is very similar to upsert in terms of
heuristics/file sizing but completely skips the index lookup step. Thus, it can
be a lot faster than upserts
for use-cases like log de-duplication (in conjunction with options to filter
duplicates mentioned below). This is also suitable for use-cases where the
table can tolerate duplicates, but just
need the transactional writes/incremental pull/storage management
capabilities of Hudi.