Re: [PR] [FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner [flink]

via GitHub Sat, 31 Jan 2026 04:52:41 -0800


RocMarshal commented on code in PR #27446:
URL: https://github.com/apache/flink/pull/27446#discussion_r2749534711



##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java:
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.api.writer;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.core.io.IOReadableWritable;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/** A record writer based on load of downstream tasks. */
+@Internal
+public final class AdaptiveLoadBasedRecordWriter<T extends IOReadableWritable>
+        extends RecordWriter<T> {
+
+    private final int maxTraverseSize;
+
+    private int currentChannel = -1;
+
+    private final int numberOfSubpartitions;
+
+    AdaptiveLoadBasedRecordWriter(
+            ResultPartitionWriter writer, long timeout, String taskName, int 
maxTraverseSize) {
+        super(writer, timeout, taskName);
+        this.numberOfSubpartitions = writer.getNumberOfSubpartitions();
+        this.maxTraverseSize = Math.min(maxTraverseSize, 
numberOfSubpartitions);

Review Comment:
   Hi, @davidradl 
   thanks for your comments.
   
   > I am wondering why we need numberOfSubpartitions and the maxTraverseSize. 
why not set numberOfSubpartitions to Math.min(maxTraverseSize, 
numberOfSubpartitions) and remove private final int maxTraverseSize;. then you 
do not need to check the maxTraverseSize. in the logic as the 
numberOfSubpartitions will always be the minimum, accounting for the 
maxTraverseSize.
   
   `numberOfSubpartitions` represents the number of downstream partitions that 
can be written to.
   
   `maxTraverseSize`, on the other hand, represents the maximum number of 
partitions that the current partition selector can compare when performing 
`rescale` or `rebalance`.
   
   Based on the above description, suppose `numberOfSubpartitions = 6` and 
`maxTraverseSize = 2`. In this case, the program would inevitably stop writing 
data to `4` downstream partitions, which is not the expected behavior.
   
   
   > Also on a previous response to a review comment you said maxTraverseSize 
could not be 1, but it could end as one if numberOfSubpartitions == 1 due this 
Math.min. We should probably check for the numberOfSubpartitions == 1 case and 
not do adaptive processing.
   
   When the number of downstream partitions is 1, setting `maxTraverseSize` to 
a value greater than 1 is meaningless, because there is only one downstream 
partition. No additional traversal or comparison is needed, and the only 
available partition can be selected directly.
   In addition, when the number of downstream partitions is not 1 and the user 
explicitly sets `maxTraverseSize` to 1, this means that under this strategy the 
next partition is selected directly without any load calculation, and data is 
written to it immediately. This behavior is equivalent to not enabling the 
adaptive partition feature.
   
   
   Therefore, when we previously said that `maxTraverseSize` cannot be 1, we 
meant that users are not allowed to configure this option with a value of 1. It 
does not mean that the internal `maxTraverseSize` cannot be 1. As explained 
above, when the internal `maxTraverseSize` becomes 1, it is caused by the 
number of downstream partitions being `1`.
   
   The number of downstream partitions is not always determined by user 
operations. For example, when a streaming job enables the `adaptive scheduler`, 
the parallelism of each operator or task may differ, which can lead to an 
uncontrollable number of downstream partitions for certain tasks. As a result, 
`maxTraverseSize` inside the writer may become 1 in such cases.
   
   Please correct me if I'm wrong. Any input is appreciated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner [flink]

Reply via email to