zhangjun0x01 commented on code in PR #584:
URL: https://github.com/apache/flink-table-store/pull/584#discussion_r1138150614
##########
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/FlinkConnectorOptions.java:
##########
@@ -82,6 +82,20 @@ public class FlinkConnectorOptions {
+ "By default, if this option is not
defined, the planner will derive the parallelism "
+ "for each statement individually by also
considering the global configuration.");
+ public static final ConfigOption<Boolean> INFER_SCAN_PARALLELISM =
+ ConfigOptions.key("scan.infer-parallelism")
+ .booleanType()
+ .defaultValue(false)
+ .withDescription(
+ "If it is false, parallelism of source are set by
scan.parallelism. "
+ + "If it is true, source parallelism is
inferred according to splits number (batch mode) or bucket number(streaming
mode).");
+
+ public static final ConfigOption<Integer> INFER_SCAN_PARALLELISM_MAX =
+ ConfigOptions.key("scan.infer-parallelism.max")
Review Comment:
the split number is affected by `source.split.target-size` and
`source.split.open-file-cost`, for most cases, it is no problem. If the table
data is large, the inferred number of splits is too large, but the user does
not want this job to occupy too many resources, so set an upper limit, so that
the resource utilization of most tasks can be met in a reasonable range, and at
the same time, the cluster resources will not be filled up because the split of
a job is too large.
refer to
`https://github.com/apache/flink/blob/cf358d7d55ca48b9d25e5217006898e3070a85ad/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveOptions.java#L61`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]