Yangze Guo created FLINK-20865:
----------------------------------
Summary: Prevent potential resource deadlock in fine-grained
resource management
Key: FLINK-20865
URL: https://issues.apache.org/jira/browse/FLINK-20865
Project: Flink
Issue Type: Improvement
Components: Runtime / Coordination
Reporter: Yangze Guo
Fix For: 1.13.0
Attachments: 屏幕快照 2021-01-06 下午2.32.57.png
!屏幕快照 2021-01-06 下午2.32.57.png|width=954,height=288!
The above figure demonstrates a potential case of deadlock due to scheduling
dependency. For the given topology, initially the scheduler will request 4
slots, for A, B, C and D. Assuming only 2 slots are available, if both slots
are assigned to Pipeline Region 0 (as shown on the left), A and B will first
finish execution, then C and D will be executed, and finally E will be
executed. However, if in the beginning the 2 slots are assigned to A and C (as
shown on the right), then neither of A and C can finish execution due to
missing B and D consuming the data they produced.
Currently, with coarse-grained resource management, the scheduler guarantees to
always finish fulfilling requirements of one pipeline region before starting to
fulfill requirements of another. That means the deadlock case shown on the
right of the above figure can never happen.
However, there’s no such guarantee in fine-grained resource management. Since
resource requirements for SSGs can be different, there’s no control on which
requirements will be fulfilled first, when there’s not enough resources to
fulfill all the requirements. Therefore, it’s not always possible to fulfill
one pipeline region prior to another.
To solve this problem, we can make the scheduler defer requesting slots for
other SSGs before requirements of the current SSG are fulfilled, for
fine-grained resource management, at the price of more scheduling time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)