pratyakshsharma commented on a change in pull request #3549:
URL: https://github.com/apache/hudi/pull/3549#discussion_r707523359
##########
File path: website/blog/2021-08-27-bulk-insert-sort-modes.md
##########
@@ -0,0 +1,88 @@
+---
+title: "Bulk Insert Sort Modes with Apache Hudi"
+excerpt: "Different sort modes available with BulkInsert"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi supports a `bulk_insert` operation in addition to "insert" and
"upsert" to ingest data into a hudi table.
+There are different sort modes that one could employ while using bulk_insert.
This blog will talk about
+different sort modes available out of the box, and how each compares with
others.
+<!--truncate-->
+
+Apache Hudi supports “bulk_insert” to assist in initial loading to data to a
hudi table. This is expected
+to be faster when compared to using “insert” or “upsert” operations. Bulk
insert differs from insert in two
+aspects. Existing records are never looked up with bulk_insert, and some
writer side optimizations like
Review comment:
I still think there is some confusion here. I went through the entire
flow of DeltaStreamer. As per the below 2 lines -
1.
https://github.com/apache/hudi/blob/5d60491f5b76ef0f77174d71567d0673d9315bcd/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L469
2.
https://github.com/apache/hudi/blob/5d60491f5b76ef0f77174d71567d0673d9315bcd/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L597
Both the types of deduplication happens for INSERT as well as BULK_INSERT
cases. Please correct me if I am still getting it wrong @nsivabalan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]