RocMarshal commented on code in PR #27446: URL: https://github.com/apache/flink/pull/27446#discussion_r2749534711
########## flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java: ########## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.api.writer; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.core.io.IOReadableWritable; + +import java.io.IOException; +import java.nio.ByteBuffer; + +/** A record writer based on load of downstream tasks. */ +@Internal +public final class AdaptiveLoadBasedRecordWriter<T extends IOReadableWritable> + extends RecordWriter<T> { + + private final int maxTraverseSize; + + private int currentChannel = -1; + + private final int numberOfSubpartitions; + + AdaptiveLoadBasedRecordWriter( + ResultPartitionWriter writer, long timeout, String taskName, int maxTraverseSize) { + super(writer, timeout, taskName); + this.numberOfSubpartitions = writer.getNumberOfSubpartitions(); + this.maxTraverseSize = Math.min(maxTraverseSize, numberOfSubpartitions); Review Comment: Hi, @davidradl thanks for your comments. > I am wondering why we need numberOfSubpartitions and the maxTraverseSize. why not set numberOfSubpartitions to Math.min(maxTraverseSize, numberOfSubpartitions) and remove private final int maxTraverseSize;. then you do not need to check the maxTraverseSize. in the logic as the numberOfSubpartitions will always be the minimum, accounting for the maxTraverseSize. `numberOfSubpartitions` represents the number of downstream partitions that can be written to. `maxTraverseSize`, on the other hand, represents the maximum number of partitions that the current partition selector can compare when performing `rescale` or `rebalance`. Based on the above description, suppose `numberOfSubpartitions = 6` and `maxTraverseSize = 2`. In this case, the program would inevitably stop writing data to `4` downstream partitions, which is not the expected behavior. > Also on a previous response to a review comment you said maxTraverseSize could not be 1, but it could end as one if numberOfSubpartitions == 1 due this Math.min. We should probably check for the numberOfSubpartitions == 1 case and not do adaptive processing. When the number of downstream partitions is 1, setting `maxTraverseSize` to a value greater than 1 is meaningless, because there is only one downstream partition. No additional traversal or comparison is needed, and the only available partition can be selected directly. In addition, when the number of downstream partitions is not 1 and the user explicitly sets `maxTraverseSize` to 1, this means that under this strategy the next partition is selected directly without any load calculation, and data is written to it immediately. This behavior is equivalent to not enabling the adaptive partition feature. Therefore, when we previously said that `maxTraverseSize` cannot be 1, we meant that users are not allowed to configure this option with a value of 1. It does not mean that the internal `maxTraverseSize` cannot be 1. As explained above, when the internal `maxTraverseSize` becomes 1, it is caused by the number of downstream partitions being `1`. The number of downstream partitions is not always determined by user operations. For example, when a streaming job enables the `adaptive scheduler`, the parallelism of each operator or task may differ, which can lead to an uncontrollable number of downstream partitions for certain tasks. As a result, `maxTraverseSize` inside the writer may become 1 in such cases. Please correct me if I'm wrong. Any input is appreciated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
