Zhijiang created FLINK-16645:
--------------------------------
Summary: Limit the maximum backlogs in subpartitions for data skew
case
Key: FLINK-16645
URL: https://issues.apache.org/jira/browse/FLINK-16645
Project: Flink
Issue Type: Sub-task
Components: Runtime / Network
Reporter: Zhijiang
Fix For: 1.11.0
In the case of data skew, most of the buffers in partition's LocalBufferPool
are probably requested away and accumulated in certain subpartition, which
would increase in-flight data to slow down the barrier alignment.
We can set up a proper config to control how many backlogs are allowed for one
subpartition. If one subpartition reaches this threshold, it will make the
buffer pool unavailable which blocks task processing continuously. Then we can
reduce the in-flight data for speeding up checkpoint process a bit and not
impact on the performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)