[GitHub] [flink-table-store] zhangjun0x01 commented on a diff in pull request #584: [FLINK-31338] support infer parallelism for flink table store

via GitHub Wed, 15 Mar 2023 22:36:30 -0700


zhangjun0x01 commented on code in PR #584:
URL: https://github.com/apache/flink-table-store/pull/584#discussion_r1138150614



##########
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/FlinkConnectorOptions.java:
##########
@@ -82,6 +82,20 @@ public class FlinkConnectorOptions {
                                     + "By default, if this option is not 
defined, the planner will derive the parallelism "
                                     + "for each statement individually by also 
considering the global configuration.");
 
+    public static final ConfigOption<Boolean> INFER_SCAN_PARALLELISM =
+            ConfigOptions.key("scan.infer-parallelism")
+                    .booleanType()
+                    .defaultValue(false)
+                    .withDescription(
+                            "If it is false, parallelism of source are set by 
scan.parallelism. "
+                                    + "If it is true, source parallelism is 
inferred according to splits number (batch mode) or bucket number(streaming 
mode).");
+
+    public static final ConfigOption<Integer> INFER_SCAN_PARALLELISM_MAX =
+            ConfigOptions.key("scan.infer-parallelism.max")

Review Comment:
   the  split number is affected by `source.split.target-size` and 
`source.split.open-file-cost`, for most cases, it is no problem. If the table 
data is large, the inferred number of splits is too large, but the user does 
not want this job to occupy too many resources, so set an upper limit, so that 
the resource utilization of most tasks can be met in a reasonable range, and at 
the same time, the cluster resources will not be filled up because the split of 
a job is too large.
   
   refer to 
`https://github.com/apache/flink/blob/cf358d7d55ca48b9d25e5217006898e3070a85ad/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveOptions.java#L61`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-table-store] zhangjun0x01 commented on a diff in pull request #584: [FLINK-31338] support infer parallelism for flink table store

Reply via email to