[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3549: [HUDI-2369] Blog on bulk_insert sort modes

GitBox Mon, 13 Sep 2021 10:03:47 -0700


pratyakshsharma commented on a change in pull request #3549:
URL: https://github.com/apache/hudi/pull/3549#discussion_r707523359




##########
File path: website/blog/2021-08-27-bulk-insert-sort-modes.md
##########
@@ -0,0 +1,88 @@
+---
+title: "Bulk Insert Sort Modes with Apache Hudi"
+excerpt: "Different sort modes available with BulkInsert"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi supports a `bulk_insert` operation in addition to "insert" and 
"upsert" to ingest data into a hudi table. 
+There are different sort modes that one could employ while using bulk_insert. 
This blog will talk about 
+different sort modes available out of the box, and how each compares with 
others. 
+<!--truncate-->
+
+Apache Hudi supports “bulk_insert” to assist in initial loading to data to a 
hudi table. This is expected
+to be faster when compared to using “insert” or “upsert” operations. Bulk 
insert differs from insert in two
+aspects. Existing records are never looked up with bulk_insert, and some 
writer side optimizations like 

Review comment:
       I still think there is some confusion here. I went through the entire 
flow of DeltaStreamer. As per the below 2 lines - 
   1. 
https://github.com/apache/hudi/blob/5d60491f5b76ef0f77174d71567d0673d9315bcd/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L469
   2. 
https://github.com/apache/hudi/blob/5d60491f5b76ef0f77174d71567d0673d9315bcd/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L597
   
   Both the types of deduplication happens for INSERT as well as BULK_INSERT 
cases. Please correct me if I am still getting it wrong @nsivabalan 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3549: [HUDI-2369] Blog on bulk_insert sort modes

Reply via email to