Github user zhijiangW commented on a diff in the pull request:
https://github.com/apache/flink/pull/4509#discussion_r141794108
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
---
@@ -390,7 +390,63 @@ public BufferProvider getBufferProvider() throws
IOException {
return inputGate.getBufferProvider();
}
- public void onBuffer(Buffer buffer, int sequenceNumber) {
+ /**
+ * Requests buffer from input channel directly for receiving network
data.
+ * It should always return an available buffer in credit-based mode.
+ *
+ * @return The available buffer.
+ */
+ public Buffer requestBuffer() {
+ synchronized (availableBuffers) {
+ return availableBuffers.poll();
+ }
+ }
+
+ /**
+ * Receives the backlog from producer's buffer response. If the number
of available
+ * buffers is less than the backlog length, it will request floating
buffers from buffer
+ * pool, and then notify unannounced credits to the producer.
+ *
+ * @param backlog The number of unsent buffers in the producer's sub
partition.
+ */
+ private void onSenderBacklog(int backlog) {
+ int numRequestedBuffers = 0;
+
+ synchronized (availableBuffers) {
+ // Important: the isReleased check should be inside the
synchronized block.
+ if (!isReleased.get()) {
+ senderBacklog.set(backlog);
+
+ while (senderBacklog.get() >
availableBuffers.size() && !isWaitingForFloatingBuffers.get()) {
--- End diff --
Actually I implemented this strategy in two different ways on our
production.
On `LocalBufferPool` side, it has the ability to assign available buffers
among all the listeners in round-robin fair way because it can gather all the
listeners within some time. But it may bring delay by triggering assignment on
`LocalBufferPool` side.
On `RemoteInputChannel` side, we currently implement another complicated
way to request buffers in a relatively fair way. That is :
1. Define a parameter `numBuffersPerAllocation` to indicate how many
buffers at most to request from `LocalBufferPool` each time.
2. `min(numBuffersPerAllocation, backlog)` is the actual value to request
from `LocalBufferPool`, so one channel will not occupy all the floating
buffers, even though its backlog is really large.
3. In general `numBuffersPerAllocation` should be larger than 1 to avoid
throughput decline. For example, if the floating buffers in `LocalBufferPool`
can satisfy all the requirements of `RemoteInputChannel`, it is better to
notify the producer batch of credits each time than one credit at a time by
many times.
4. On `LocalBufferPool` side, the `RemoteInputChannel` may still register
as listener after already requested `numBuffersPerAllocation` buffers when the
number of available buffers plus `numBuffersPerAllocation` is less than
`backlog`. Then it has to wait for `LocalBufferPool#recycle()` to trigger
distributing the left available buffers among all the listeners.
BTW, I did not understand clearly of the formula you mentioned above
`backlog + initialCredit - currentCredit`. I think the initial credit should
not be considered in the following interactions. `backlog-currentCredit` can
reflect the number of extra buffers needed in real time for each interaction. I
know `backlog-currentCredit` is not very accurate because some credits may be
already in-flight notification. But it can be balanced in the long run.
What do you think of this way?
---