reswqa commented on code in PR #20320:
URL: https://github.com/apache/flink/pull/20320#discussion_r925823372
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartitionReadScheduler.java:
##########
@@ -152,12 +160,34 @@ class SortMergeResultPartitionReadScheduler implements
Runnable, BufferRecycler
@Override
public synchronized void run() {
- Queue<SortMergeSubpartitionReader> availableReaders =
getAvailableReaders();
-
- Queue<MemorySegment> buffers = allocateBuffers(availableReaders);
+ Set<SortMergeSubpartitionReader> finishedReaders = new HashSet<>();
+ Queue<MemorySegment> buffers;
+ try {
+ buffers = allocateBuffers();
+ } catch (Throwable throwable) {
+ // fail all pending subpartition readers immediately if any
exception occurs
+ LOG.error("Failed to request buffers for data reading.",
throwable);
+ failSubpartitionReaders(getAllReaders(), throwable);
Review Comment:
`release` and `releaseSubpartitionReader` should take care of `sortedReaders
` also. These two method will add reader to `failedReaders` set, we should
filter out useless readers as soon as possible. For example, may not be the
best solution, we can add check to `getNextReaderToRead`, when the polled
reader contains in `failedReaders`, ignore it and poll next until meet the
reader not failed. But `removeFinishedAndFailedReaders` will clear
`failedReaders`, It is possible that some failed readers have not been polled
from `sortedReaders` in this round of scheduling. We can remove the failed
reader from `failedReaders` in `getNextReaderToRead` every time polled reader
is failed, instead of in `removeFinishedAndFailedReaders`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]