[GitHub] [flink] pnowojski commented on a change in pull request #11567: [FLINK-16645] Limit the maximum backlogs in subpartitions

GitBox Wed, 08 Apr 2020 03:15:56 -0700

pnowojski commented on a change in pull request #11567: [FLINK-16645] Limit the 
maximum backlogs in subpartitions
URL: https://github.com/apache/flink/pull/11567#discussion_r405415066


 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartition.java
 ##########
 @@ -375,4 +397,29 @@ void onConsumedSubpartition(int subpartitionIndex) {
        private void checkInProduceState() throws IllegalStateException {
                checkState(!isFinished, "Partition already finished.");
        }
+
+       /**
+        * Check whether all subpartitions' backlogs are less than the 
limitation of max backlogs, and make this partition
+        * available again if yes.
+        */
+       public void notifyDecreaseBacklog(int buffersInBacklog) {
+               if (buffersInBacklog == maxBuffersPerChannel) {
+                       if (--unavailableSubpartitionsCount == 0) {
+                               CompletableFuture<?> toNotify = 
availabilityHelper.getUnavailableToResetAvailable();
+                               toNotify.complete(null);
+                       }
+               }
+       }
+
+       /**
+        * Check whether any subpartition's backlog exceeds the limitation of 
max backlogs, and make this partition
+        * unavailabe if yes.
+        */
+       public void notifyIncreaseBacklog(int buffersInBacklog) {
+               if (buffersInBacklog == maxBuffersPerChannel + 1) {
+                       if (++unavailableSubpartitionsCount == 1) {
+                               availabilityHelper.resetUnavailable();
+                       }
+               }
+       }
 
 Review comment:
   Yes I guess you are right.
   
   > But the problem here is we may have hundreds of futures combined together 
in the subpartitionsFuture if user sets a low value to the number of max 
buffers.
   
   I think that wouldn't be a serious issue. Once a single sub-partition is 
backpressured because of data skew, data processing should pause. So as long as 
we are respecting the availability, there should be at most one blocked 
supbartition. 
   
   But it would be an issue nonetheless. It could happen for example for 
flatMap operator, when it's producing a lot of output on a single call.
   
   So the `subpartitionsFuture` would have to maintain a 
`List<CompletableFuture>` to keep track of all of the blocked subpartitions. 
With this in mind, I'm not entirely sure if this approach would be 
easier/better than going through `LocalBufferPool`? Maybe it would be better to 
have a single source of truth for output availability (`LocalBufferPool`), 
instead of distributing the logic among `LocalBufferPool` and subpartitions?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] pnowojski commented on a change in pull request #11567: [FLINK-16645] Limit the maximum backlogs in subpartitions

Reply via email to