zhijiangW commented on a change in pull request #7911: [FLINK-11082][network]
Fix the logic of getting backlog in sub partition
URL: https://github.com/apache/flink/pull/7911#discussion_r264508671
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.java
##########
@@ -116,52 +115,58 @@ protected Throwable getFailureCause() {
public abstract boolean isReleased();
- /**
- * Gets the number of non-event buffers in this subpartition.
- *
- * <p><strong>Beware:</strong> This method should only be used in tests
in non-concurrent access
- * scenarios since it does not make any concurrency guarantees.
- */
- @VisibleForTesting
- public int getBuffersInBacklog() {
- return buffersInBacklog;
- }
-
/**
* Makes a best effort to get the current size of the queue.
* This method must not acquire locks or interfere with the task and
network threads in
* any way.
*/
public abstract int unsynchronizedGetNumberOfQueuedBuffers();
+ /**
+ * Gets the number of non-event buffers in this subpartition.
+ */
+ public abstract int getBuffersInBacklog();
+
+ /**
+ * @param lastBufferAvailable whether the last buffer in this
subpartition is available for consumption
+ * @return the number of non-event buffers in this subpartition
+ */
+ protected int getBuffersInBacklog(boolean lastBufferAvailable) {
Review comment:
That is very good question. We define the variable `buffersInBacklog` for
counting the non-event buffers in the queue. `getNumberOfFinishedBuffers` and
`isAvailableUnsafe` are counting both event and non-event buffers. So they can
not be unified directly. I even tried another way by counting only event
buffers instead of current `buffersInBacklog`, but it still does not simplify
the logic. Especially for `SpillableSubpartition` we could not get total
buffers directly from the memory queue, then we still need non-event buffer
counting even though we count the event buffers.
I also find that the current conditions seem a bit complicated to maintain
in different processes, such as `shouldNotifyDataAvailable`, `isAvailable`,
`getBuffersInBacklog`, etc. And it would be better if we can integrate
something to make related conditions unification. After we have a good idea for
this issue, I would like to make changes. :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services