[GitHub] [flink] JingsongLi commented on a change in pull request #17939: [FLINK-20370][table] part2: introduce 'table.exec.sink.keyed-shuffle' option to auto keyby on sink's pk if parallelism are not the same for insertOnly input

GitBox Wed, 08 Dec 2021 23:06:49 -0800


JingsongLi commented on a change in pull request #17939:
URL: https://github.com/apache/flink/pull/17939#discussion_r765481583




##########
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java
##########
@@ -280,26 +297,29 @@ private int deriveSinkParallelism(
      * messages.
      */
     private Transformation<RowData> applyKeyBy(
-            ChangelogMode changelogMode,
+            TableConfig config,
             Transformation<RowData> inputTransform,
             int[] primaryKeys,
             int sinkParallelism,
-            boolean upsertMaterialize) {
-        final int inputParallelism = inputTransform.getParallelism();
-        if ((inputParallelism == sinkParallelism || 
changelogMode.containsOnly(RowKind.INSERT))
-                && !upsertMaterialize) {
-            return inputTransform;
+            int inputParallelism,
+            boolean inputInsertOnly,
+            boolean needMaterialize) {
+        final ExecutionConfigOptions.SinkKeyedShuffle sinkShuffleByPk =
+                
config.getConfiguration().get(ExecutionConfigOptions.TABLE_EXEC_SINK_KEYED_SHUFFLE);
+        boolean sinkKeyBy = false;
+        switch (sinkShuffleByPk) {
+            case NONE:
+                break;
+            case AUTO:
+                sinkKeyBy = inputInsertOnly && sinkParallelism != 
inputParallelism;
+                break;
+            case FORCE:
+                // single parallelism has no problem
+                sinkKeyBy = sinkParallelism != 1 || inputParallelism != 1;

Review comment:
       I was torn about this condition because I originally thought we 
shouldn't specialize the single parallelism case. But considering that there 
are a lot of single parallelism jobs in stream computing, it is worth 
optimizing to remove keyBy for single parallelism.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] JingsongLi commented on a change in pull request #17939: [FLINK-20370][table] part2: introduce 'table.exec.sink.keyed-shuffle' option to auto keyby on sink's pk if parallelism are not the same for insertOnly input

Reply via email to