[GitHub] [hudi] scxwhite commented on a diff in pull request #7309: [HUDI-5284] Add new config controls whether input rdds should be first persist before insert

GitBox Tue, 29 Nov 2022 21:46:34 -0800


scxwhite commented on code in PR #7309:
URL: https://github.com/apache/hudi/pull/7309#discussion_r1035551565



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -265,6 +265,13 @@ public class HoodieWriteConfig extends HoodieConfig {
       .withDocumentation("When inserted records share same key, controls 
whether they should be first combined (i.e de-duplicated) before"
           + " writing to storage.");
 
+  public static final ConfigProperty<String> PERSIST_BEFORE_INSERT = 
ConfigProperty

Review Comment:
   Thank you for your reply.
   You mentioned two points here. 
   - The first is: "we should refrain from adding another config here"
   The reason for adding this configuration is to consider the use of non 
sorted bulk_insert and other write operations without additional intermediate 
operations, persistence may not be necessary.
   - The secnod is: "We should optimize the DAG instead"
   The underlying principles of hudi are not clear to data developers. They may 
not realize that even if they only need to upsert once, they need to cache data 
to speed up writing.And this will also increase the development cost of data 
developers.
   Just my own opinion, I look forward to your reply。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] scxwhite commented on a diff in pull request #7309: [HUDI-5284] Add new config controls whether input rdds should be first persist before insert

Reply via email to