[ 
https://issues.apache.org/jira/browse/FLINK-14472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhijiang updated FLINK-14472:
-----------------------------
    Description: 
Currently back-pressure monitor relies on detecting task threads that are stuck 
in `requestBufferBuilderBlocking`. There are actually two cases to cause 
back-pressure ATM:
 * There are no available buffers in `LocalBufferPool` and all the given quotas 
from global pool are also exhausted. Then we need to wait for buffer recycling 
to `LocalBufferPool`.
 * No available buffers in `LocalBufferPool`, but the quota has not been used 
up. While requesting buffer from global pool, it is blocked because of no 
available buffers in global pool. Then we need to wait for buffer recycling to 
global pool.

We try to implement the non-blocking network output in FLINK-14396, so the back 
pressure monitor should be adjusted accordingly after the non-blocking output 
is used in practice.

In detail we try to avoid the current monitor way by analyzing the task thread 
stack, which has some drawbacks discussed before:
 * If the `requestBuffer` is not triggered by task thread, the current monitor 
is invalid in practice.
 * The current monitor is heavy-weight and fragile because it needs to 
understand more details of LocalBufferPool implementation.  

We could provide a transparent method for the monitor caller to get the 
backpressure result directly, and hide the implementation details in the 
LocalBufferPool.

  was:
Currently back-pressure monitor relies on detecting task threads that are stuck 
in `requestBufferBuilderBlocking`. There are actually two cases to cause 
back-pressure ATM:
 * There are no available buffers in `LocalBufferPool` and all the given quotas 
from global pool are also exhausted. Then we need to wait for buffer recycling 
to `LocalBufferPool`.
 * No available buffers in `LocalBufferPool`, but the quota has not been used 
up. While requesting buffer from global pool, it is blocked because of no 
available buffers in global pool. Then we need to wait for buffer recycling to 
global pool.

We already implemented the non-blocking output for the first case in 
[FLINK-14396|https://issues.apache.org/jira/browse/FLINK-14396], and we expect 
the second case done together with adjusting the back-pressure monitor which 
could check for `RecordWriter#isAvailable` instead.


> Implement back-pressure monitor with non-blocking outputs
> ---------------------------------------------------------
>
>                 Key: FLINK-14472
>                 URL: https://issues.apache.org/jira/browse/FLINK-14472
>             Project: Flink
>          Issue Type: Task
>          Components: Runtime / Network
>            Reporter: zhijiang
>            Assignee: Yingjie Cao
>            Priority: Minor
>             Fix For: 1.10.0
>
>
> Currently back-pressure monitor relies on detecting task threads that are 
> stuck in `requestBufferBuilderBlocking`. There are actually two cases to 
> cause back-pressure ATM:
>  * There are no available buffers in `LocalBufferPool` and all the given 
> quotas from global pool are also exhausted. Then we need to wait for buffer 
> recycling to `LocalBufferPool`.
>  * No available buffers in `LocalBufferPool`, but the quota has not been used 
> up. While requesting buffer from global pool, it is blocked because of no 
> available buffers in global pool. Then we need to wait for buffer recycling 
> to global pool.
> We try to implement the non-blocking network output in FLINK-14396, so the 
> back pressure monitor should be adjusted accordingly after the non-blocking 
> output is used in practice.
> In detail we try to avoid the current monitor way by analyzing the task 
> thread stack, which has some drawbacks discussed before:
>  * If the `requestBuffer` is not triggered by task thread, the current 
> monitor is invalid in practice.
>  * The current monitor is heavy-weight and fragile because it needs to 
> understand more details of LocalBufferPool implementation.  
> We could provide a transparent method for the monitor caller to get the 
> backpressure result directly, and hide the implementation details in the 
> LocalBufferPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to