yihua commented on code in PR #8157:
URL: https://github.com/apache/hudi/pull/8157#discussion_r1135775680
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -247,13 +247,29 @@ public class HoodieWriteConfig extends HoodieConfig {
public static final ConfigProperty<String> INSERT_PARALLELISM_VALUE =
ConfigProperty
.key("hoodie.insert.shuffle.parallelism")
.defaultValue("0")
- .withDocumentation("Parallelism for inserting records into the table.
Inserts can shuffle data before writing to tune file sizes and optimize the
storage layout.");
+ .withDocumentation("Parallelism for inserting records into the table.
Inserts can shuffle "
+ + "data before writing to tune file sizes and optimize the storage
layout. Before "
+ + "0.13.0 release, if users do not configure it, Hudi would use 200
as the default "
+ + "shuffle parallelism. From 0.13.0 onwards Hudi by default
automatically uses the "
+ + "parallelism deduced by Spark based on the source data. If the
shuffle parallelism "
+ + "is explicitly configured by the user, the user-configured
parallelism is "
+ + "used in defining the actual parallelism. If you observe small
files from the insert "
+ + "operation, we suggest configuring this shuffle parallelism
explicitly, so that the "
+ + "parallelism is around total_input_data_size/500MB.");
Review Comment:
Makes sense. Fixed now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]