JingsongLi commented on a change in pull request #17939:
URL: https://github.com/apache/flink/pull/17939#discussion_r765481583
##########
File path:
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java
##########
@@ -280,26 +297,29 @@ private int deriveSinkParallelism(
* messages.
*/
private Transformation<RowData> applyKeyBy(
- ChangelogMode changelogMode,
+ TableConfig config,
Transformation<RowData> inputTransform,
int[] primaryKeys,
int sinkParallelism,
- boolean upsertMaterialize) {
- final int inputParallelism = inputTransform.getParallelism();
- if ((inputParallelism == sinkParallelism ||
changelogMode.containsOnly(RowKind.INSERT))
- && !upsertMaterialize) {
- return inputTransform;
+ int inputParallelism,
+ boolean inputInsertOnly,
+ boolean needMaterialize) {
+ final ExecutionConfigOptions.SinkKeyedShuffle sinkShuffleByPk =
+
config.getConfiguration().get(ExecutionConfigOptions.TABLE_EXEC_SINK_KEYED_SHUFFLE);
+ boolean sinkKeyBy = false;
+ switch (sinkShuffleByPk) {
+ case NONE:
+ break;
+ case AUTO:
+ sinkKeyBy = inputInsertOnly && sinkParallelism !=
inputParallelism;
+ break;
+ case FORCE:
+ // single parallelism has no problem
+ sinkKeyBy = sinkParallelism != 1 || inputParallelism != 1;
Review comment:
I was torn about this condition because I originally thought we
shouldn't specialize the single parallelism case. But considering that there
are a lot of single parallelism jobs in stream computing, it is worth
optimizing to remove keyBy for single parallelism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]