jiangxin369 commented on code in PR #23851:
URL: https://github.com/apache/flink/pull/23851#discussion_r1505305577
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/BufferManager.java:
##########
@@ -132,15 +149,73 @@ private boolean shouldContinueRequest(BufferPool
bufferPool) {
}
}
- /** Requests exclusive buffers from the provider. */
- void requestExclusiveBuffers(int numExclusiveBuffers) throws IOException {
- checkArgument(numExclusiveBuffers >= 0, "Num exclusive buffers must be
non-negative.");
+ private void resizeBufferQueue() {
+ if (shouldRequestExclusiveBufferFromGlobal()) {
+ return;
+ }
+
+ SingleInputGate inputGate = inputChannel.inputGate;
+ int currentSize = inputGate.getBufferPool().getNumBuffers();
+ int numRemoteChannels =
+ inputGate.getNumberOfInputChannels() -
inputGate.getNumberOfLocalInputChannels();
Review Comment:
Yes, it is. The `NumberOfLocalInputChannels` is only updated during setting
up a `KnownInputChannel` or transferring a `UnknowInputChannel` to a known one.
The netty thread in `BufferManager` is working after these phases, so it is
impossible for two threads to visit the `NumberOfLocalInputChannels`
simultaneously.
##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer/LocalBufferPool.java:
##########
@@ -272,16 +273,22 @@ public void reserveSegments(int
numberOfSegmentsToReserve) throws IOException {
"Can not reserve more segments than number of minimum
segments.");
CompletableFuture<?> toNotify = null;
+
+ int numSegmentsNeeded;
synchronized (availableMemorySegments) {
checkDestroyed();
+ numSegmentsNeeded = numberOfSegmentsToReserve -
numberOfRequestedMemorySegments;
+ }
- if (numberOfRequestedMemorySegments < numberOfSegmentsToReserve) {
- availableMemorySegments.addAll(
- networkBufferPool.requestPooledMemorySegmentsBlocking(
- numberOfSegmentsToReserve -
numberOfRequestedMemorySegments));
+ if (numSegmentsNeeded > 0) {
Review Comment:
It happens when all buffers in the global buffer pool are used up even if
all LBPs are only allocated their `min` buffers. At this time if an InputGate
is reserving segments, it needs to wait for another LBP destroyed. Destroying
an LBP will cause a redistribution over all LBPs, which requires the same lock
during reserving segments, so the deadlock happens.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]