Zouxxyy commented on code in PR #7343:
URL: https://github.com/apache/hudi/pull/7343#discussion_r1041044270
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -143,6 +143,15 @@ public class HoodieClusteringConfig extends HoodieConfig {
.sinceVersion("0.9.0")
.withDocumentation("Config to control frequency of async clustering");
+ public static final ConfigProperty<Integer> CLUSTERING_MAX_THREADS =
ConfigProperty
+ .key("hoodie.clustering.max.threads")
+ .defaultValue(10)
+ .sinceVersion("0.13.0")
+ .withDocumentation("Maximum number of parallelism jobs submitted in
clustering operation. "
+ + "If the resource is sufficient(Like Spark engine has enough idle
executors), increasing this "
+ + "value will let the clustering job run faster, while it will give
additional pressure to the "
+ + "execution engines to manage more concurrent running jobs.");
+
Review Comment:
In my opinion, the increase of parameters will increase the difficulty of
users; and I think 10 may be too small, the value can be determined by the
number of CPU cores or the spark parallelism, It may be better, and the number
of cluster groups can be used as the upper limit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]