[GitHub] [hudi] yihua commented on a diff in pull request #8157: [HUDI-5920] Improve documentation of parallelism configs

via GitHub Tue, 14 Mar 2023 08:50:14 -0700


yihua commented on code in PR #8157:
URL: https://github.com/apache/hudi/pull/8157#discussion_r1135775680



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -247,13 +247,29 @@ public class HoodieWriteConfig extends HoodieConfig {
   public static final ConfigProperty<String> INSERT_PARALLELISM_VALUE = 
ConfigProperty
       .key("hoodie.insert.shuffle.parallelism")
       .defaultValue("0")
-      .withDocumentation("Parallelism for inserting records into the table. 
Inserts can shuffle data before writing to tune file sizes and optimize the 
storage layout.");
+      .withDocumentation("Parallelism for inserting records into the table. 
Inserts can shuffle "
+          + "data before writing to tune file sizes and optimize the storage 
layout. Before "
+          + "0.13.0 release, if users do not configure it, Hudi would use 200 
as the default "
+          + "shuffle parallelism. From 0.13.0 onwards Hudi by default 
automatically uses the "
+          + "parallelism deduced by Spark based on the source data. If the 
shuffle parallelism "
+          + "is explicitly configured by the user, the user-configured 
parallelism is "
+          + "used in defining the actual parallelism. If you observe small 
files from the insert "
+          + "operation, we suggest configuring this shuffle parallelism 
explicitly, so that the "
+          + "parallelism is around total_input_data_size/500MB.");

Review Comment:
   Makes sense.  Fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on a diff in pull request #8157: [HUDI-5920] Improve documentation of parallelism configs

Reply via email to