[
https://issues.apache.org/jira/browse/HUDI-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411296#comment-17411296
]
ASF GitHub Bot commented on HUDI-2369:
--------------------------------------
nsivabalan commented on a change in pull request #3549:
URL: https://github.com/apache/hudi/pull/3549#discussion_r703597160
##########
File path: website/blog/2021-08-27-bulk-insert-sort-modes.md
##########
@@ -0,0 +1,88 @@
+---
+title: "Bulk Insert Sort Modes with Apache Hudi"
+excerpt: "Different sort modes available with BulkInsert"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi supports a `bulk_insert` operation in addition to "insert" and
"upsert" to ingest data into a hudi table.
+There are different sort modes that one could employ while using bulk_insert.
This blog will talk about
+different sort modes available out of the box, and how each compares with
others.
+<!--truncate-->
+
+Apache Hudi supports “bulk_insert” to assist in initial loading to data to a
hudi table. This is expected
+to be faster when compared to using “insert” or “upsert” operations. Bulk
insert differs from insert in two
+aspects. Existing records are never looked up with bulk_insert, and some
writer side optimizations like
Review comment:
guess you got confused w/ two configs. One is dedup(combine before
insert) and another is Insert_Drop_Dupes. dedup is just deduping among incoming
batch of records. Insert_Drop_Dupes is dropping those records that are already
in storage. with row writer path, we don't support Insert_Drop_dupes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Blog on bulk insert sort modes
> ------------------------------
>
> Key: HUDI-2369
> URL: https://issues.apache.org/jira/browse/HUDI-2369
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Docs
> Reporter: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Blog on bulk insert sort modes
--
This message was sent by Atlassian Jira
(v8.3.4#803005)