[GitHub] [hudi] boneanxs commented on a diff in pull request #7343: [HUDI-5303] Allow users to control the concurrency to submit jobs in clustering

via GitHub Sun, 25 Jun 2023 04:14:58 -0700


boneanxs commented on code in PR #7343:
URL: https://github.com/apache/hudi/pull/7343#discussion_r1241140092



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -143,6 +143,15 @@ public class HoodieClusteringConfig extends HoodieConfig {
       .sinceVersion("0.9.0")
       .withDocumentation("Config to control frequency of async clustering");
 
+  public static final ConfigProperty<Integer> CLUSTERING_MAX_THREADS = 
ConfigProperty
+      .key("hoodie.clustering.max.threads")
+      .defaultValue(10)
+      .sinceVersion("0.13.0")
+      .withDocumentation("Maximum number of parallelism jobs submitted in 
clustering operation. "
+          + "If the resource is sufficient(Like Spark engine has enough idle 
executors), increasing this "
+          + "value will let the clustering job run faster, while it will give 
additional pressure to the "
+          + "execution engines to manage more concurrent running jobs.");
+

Review Comment:
   It's difficult to get `the value can be determined by the number of CPU 
cores or the spark parallelism`, and I think users could set values larger than 
CPU cores, we need to support it.
   
   Added the number of cluster groups can be used as the upper limit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] boneanxs commented on a diff in pull request #7343: [HUDI-5303] Allow users to control the concurrency to submit jobs in clustering

Reply via email to