reswqa commented on code in PR #20320:
URL: https://github.com/apache/flink/pull/20320#discussion_r925818232
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -152,12 +160,34 @@ class SortMergeResultPartitionReadScheduler implements
Runnable, BufferRecycler
@Override
public synchronized void run() {
- Queue<SortMergeSubpartitionReader> availableReaders =
getAvailableReaders();
-
- Queue<MemorySegment> buffers = allocateBuffers(availableReaders);
+ Set<SortMergeSubpartitionReader> finishedReaders = new HashSet<>();
+ Queue<MemorySegment> buffers;
+ try {
+ buffers = allocateBuffers();
+ } catch (Throwable throwable) {
+ // fail all pending subpartition readers immediately if any
exception occurs
+ LOG.error("Failed to request buffers for data reading.",
throwable);
+ failSubpartitionReaders(getAllReaders(), throwable);
Review Comment:
If a failed reader not removed from `sortedBuffers` in time, It can only be
removed after the next poll, but this failed reader will read data from disk
int that case, which is an unnecessary extra overhead. IMO, we need clear
`sortedReaders` when `allocateBuffers` throw exception.
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -283,13 +284,26 @@ private void mayNotifyReleased() {
}
}
- private Queue<SortMergeSubpartitionReader> getAvailableReaders() {
+ private Queue<SortMergeSubpartitionReader> getAllReaders() {
synchronized (lock) {
if (isReleased) {
return new ArrayDeque<>();
}
+ return new ArrayDeque<>(allReaders);
+ }
+ }
- return new PriorityQueue<>(allReaders);
+ @Nullable
+ private SortMergeSubpartitionReader getNextReaderToRead(
Review Comment:
```suggestion
private SortMergeSubpartitionReader addPreviousAndGetNextReader(
```
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -152,12 +160,34 @@ class SortMergeResultPartitionReadScheduler implements
Runnable, BufferRecycler
@Override
public synchronized void run() {
- Queue<SortMergeSubpartitionReader> availableReaders =
getAvailableReaders();
-
- Queue<MemorySegment> buffers = allocateBuffers(availableReaders);
+ Set<SortMergeSubpartitionReader> finishedReaders = new HashSet<>();
+ Queue<MemorySegment> buffers;
+ try {
+ buffers = allocateBuffers();
+ } catch (Throwable throwable) {
+ // fail all pending subpartition readers immediately if any
exception occurs
+ LOG.error("Failed to request buffers for data reading.",
throwable);
+ failSubpartitionReaders(getAllReaders(), throwable);
Review Comment:
`release` and `releaseSubpartitionReader` should take care of `sortedReaders
` also. These two method will add reader to `failedReaders` set, I suggest add
check to `getNextReaderToRead`, when the polled reader contains in
`failedReaders`, we can ignore it and poll next until meet the reader not
failed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]